250 likes | 380 Views
Semantic Smoothing for Text Clustering. Presenter : Bei -YI Jiang Authors : Jamal A. Nasir , Iraklis Varlamis , Asim Karim , George Tsatsaronis 2013. Knowledge-Based Systems. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
Semantic Smoothing for Text Clustering Presenter : Bei-YI JiangAuthors : Jamal A. Nasir, IraklisVarlamis, AsimKarim, George Tsatsaronis2013. Knowledge-Based Systems
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • (VSM) It assumes independency between the vocabulary terms and ignores all the conceptual relations between terms that potentially exist.
Objectives • To increase the importance of core words by considering the terms’ relations, and in parallel downsize the contribution of general terms, leading to better text clustering results.
Methodology 1.1 The Vector Space Model(VSM) 1.2 The Generalized Vector Space Model(GVSM) Document representations 1. 2.1 Omiotis 2.2 Wikipedia-based relatedness 2.3 Average of Omiotis and Wikipedia-based relatedness 2.4 Pointwise mutual information Relatedness measure 2. 3.1 Clustering algorithms 3.2 Algorithms complexity 3.3 Clustering criterion functions Document clustering 3. A GVSM-based semantic kernel S-VSM 4. Top-k S-VSM 5.
Methodology • The Vector Space Model(VSM)
Methodology • The Generalized Vector Space Model(GVSM)
Methodology • Omiotis • Wikipedia-based relatedness • Average of Omiotis and Wikipedia-based relatedness
Methodology • Pointwise mutual information
Experiments • Vector similarity • Evaluation measures
Experiments • Evaluation measures • Purity • Entropy • Error rate
Conclusions • The evaluation results demonstrated that S-VSM dominates VSM in performance in most of the combinations and compares favorably to GVSM. • In order to further reduce the complexity of S-VSM we introduced an extension of it, namely the top-k S-VSM.
Comments • Advantages • It offers a very flexible kernel that can be applied within any domain or with any language. • The ability of the S-VSM perform much better than the VSM in the task of text clustering. • It very efficiently in terms of time and space complexity • Applications • Text clustering • Semantic smoothing kernels