Semantic Smoothing for Text Clustering

Semantic Smoothing for Text Clustering Presenter : Bei-YI JiangAuthors : Jamal A. Nasir, IraklisVarlamis, AsimKarim, George Tsatsaronis2013. Knowledge-Based Systems

Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

Motivation • (VSM) It assumes independency between the vocabulary terms and ignores all the conceptual relations between terms that potentially exist.

Objectives • To increase the importance of core words by considering the terms’ relations, and in parallel downsize the contribution of general terms, leading to better text clustering results.

Methodology 1.1 The Vector Space Model(VSM) 1.2 The Generalized Vector Space Model(GVSM) Document representations 1. 2.1 Omiotis 2.2 Wikipedia-based relatedness 2.3 Average of Omiotis and Wikipedia-based relatedness 2.4 Pointwise mutual information Relatedness measure 2. 3.1 Clustering algorithms 3.2 Algorithms complexity 3.3 Clustering criterion functions Document clustering 3. A GVSM-based semantic kernel S-VSM 4. Top-k S-VSM 5.

Methodology • The Vector Space Model(VSM)

Methodology • The Generalized Vector Space Model(GVSM)

Methodology • Omiotis • Wikipedia-based relatedness • Average of Omiotis and Wikipedia-based relatedness

Methodology • Pointwise mutual information

Methodology

Experiments

Experiments • Vector similarity • Evaluation measures

Experiments • Evaluation measures • Purity • Entropy • Error rate

Experiments

Conclusions • The evaluation results demonstrated that S-VSM dominates VSM in performance in most of the combinations and compares favorably to GVSM. • In order to further reduce the complexity of S-VSM we introduced an extension of it, namely the top-k S-VSM.

Comments • Advantages • It offers a very flexible kernel that can be applied within any domain or with any language. • The ability of the S-VSM perform much better than the VSM in the task of text clustering. • It very efficiently in terms of time and space complexity • Applications • Text clustering • Semantic smoothing kernels

Semantic Smoothing for Text Clustering