A Method for Measuring Keywords Similarity by Applying Jaccard’s, N-Gram and Vector Space
Jatsada Singthongchai and Suphakit Niwattanakul
School of Information Technology,Suranaree University of Technology,Nakhon Ratchasima, Thailand
Abstract—Obviously, searching engines today have not sufficiently fulfilled the needs of users. Most of these searching engines are functioned using keywords query which are identical and relevant to conceptual search. This means that matched or successful searching results depend on how a user spells the keywords. Thus, the method for measuring keywords similarity between keywords query and index words is very crucial. This research focuses on keywords search. We have designed method for measuring keywords similarity with Jaccard’s, N-Gram, Vector space, Average (JNVA) and Jaccard’s, N-Gram, Length, Average (JNLA) by using hybrid method; a combination of Jaccard’s , N-Gram and Vector Space to make Keywords search practical. These methods are evaluated by three criterions (precision, recall, and F-measure). The result reveals that the method for measuring keywords similarity with the application of JNVA and JNLA can successfully predict the similarity between keywords query with index words. These methods can be applied in order to develop searching engines performance especially semantic search.
Index Terms—keywords similarity, Jaccard’s, N-Gram, vector space
Cite: Jatsada Singthongchai and Suphakit Niwattanakul, "A Method for Measuring Keywords Similarity by Applying Jaccard’s, N-Gram and Vector Space," Lecture Notes on Information Theory, Vol.1, No.4, pp. 159-164, Dec. 2013. doi: 10.12720/lnit.1.4.159-164
Index Terms—keywords similarity, Jaccard’s, N-Gram, vector space
Cite: Jatsada Singthongchai and Suphakit Niwattanakul, "A Method for Measuring Keywords Similarity by Applying Jaccard’s, N-Gram and Vector Space," Lecture Notes on Information Theory, Vol.1, No.4, pp. 159-164, Dec. 2013. doi: 10.12720/lnit.1.4.159-164