Enhancing Cluster Labeling Using Wikipedia

Enhancing Cluster Labeling Using Wikipedia David Carmel, HaggaiRoitman, NaamaZwerdlingIBM ResearchLab {carmel,haggai,naamaz}@il.ibm.com Presentby Miguel Panuera mpanuera@gmail.com • School of Computer Science San Pablo CatholicUniversity AREQUIPA – PERU 2010

CONTENT • ClusterLabeling • WhyWikipedia • Terms extracted: JSD vs Wikipedia • General Framework forclusterlabeling • Experiments • Summary

ClusterLabeling • This process tries to select descriptive labels for the clusters

WhyWikipedia • One of the major knowledge resource for manyinformationretrievaltasks. • Textcategorizationand clustering. • Computing semanticrelatednessbetweenconcepts. • Predictingdocumenttopics.

Terms extracted: JSD vs Wikipedia While the list of important terms fairly represents the content of the categories, these terms can serve as appropriate labels for only a few categories. On the other hand, Wikipedia labels agree with human annotated labels much more.

GENERAL FRAMEWORK FOR CLUSTER LABELING

GENERAL FRAMEWORK FOR CLUSTER LABELING Documents are first parsed and tokenized

GENERAL FRAMEWORK FOR CLUSTER LABELING The clustering algorithms goal is to create coherent clusters for which documents within a cluster share the same topics

GENERAL FRAMEWORK FOR CLUSTER LABELING We now wish to find a list of terms ordered by their estimated importance, to represent the content of the cluster’s documents. Such terms consist of single keywords

GENERAL FRAMEWORK FOR CLUSTER LABELING Wenowwishtoextract candidate labels for cluster C

GENERAL FRAMEWORK FOR CLUSTER LABELING Candidate labels are evaluated by several judges. Theneachjudge evaluates the candidates according to its evaluation policy.

Experiments K: indicates the number of required cluster labels Match@K: The relative number of clusters for which at least one of the top-k labels is correct.

Summary • Wedescribed a general framework for cluster labeling that extracts candidate labels from the text and from Wikipedia • Cluster labeling with Wikipedia is extremely successful, as shown by our results.

THANKS

Enhancing Cluster Labeling Using Wikipedia David Carmel, HaggaiRoitman, NaamaZwerdlingIBM ResearchLab {carmel,haggai,naamaz}@il.ibm.com Presentby Miguel Panuera mpanuera@gmail.com San Pablo CatholicUniversity • School of Computer Science AREQUIPA – PERU 2010

Enhancing Cluster Labeling Using Wikipedia

Enhancing Cluster Labeling Using Wikipedia

Presentation Transcript

ENHANCING CLUSTER LABELING USING WIKIPEDIA

Multilingual Word Sense Disambiguation using Wikipedia

Enhancing Text Clustering by Leveraging Wikipedia Semantics

WIKIPEDIA

APS Wikipedia Initiative: Using Wikipedia Writing in Psychology Classes

Computing semantic relatedness using Wikipedia features

wikipedia

Wikipedia

WIKIPEDIA

Technical Vocabulary learning using Wikipedia

Natural Language Processing using Wikipedia

Graph-based cluster labeling using Growing Hierarchal SOM

Finding Domain Terms using Wikipedia

Wikipedia

Using cluster organisation benchmarking

Advanced Map Labeling using Maplex

Sentence simpliFIcation using simple wikipedia

Chapter 7 ENHANCING CLOUD COMPUTING ENVIRONMENTS USING A CLUSTER AS A SERVICE

Enhancing Data Labeling Through Automated Labeling with Manual Verification