Clustering Word Senses

IxA NLP group http://ixa.si.ehu.es Clustering Word Senses Eneko Agirre, Oier Lopez de Lacalle

Introduction: motivation • Desired grained of word sense distinctions controversial • Fine-grainedness of word senses unnecessary for some applications • MT: channel (tv, strait)  kanal • Senseval-2 WSD competition also provides coarse-grained senses • The desired sense groupings depend on the application: • MT: same translation (language pair dependant) • IR: some related senses: metonymic, diathesis, specialization • Dialogue (deeper NLP): in principle, all word senses in order to do proper inferences • WSD needs to be tuned, multiple senses returned • Clustering of word senses

Introduction: a sample word • Channel has 7 senses and 4 coarse-grained senses (Senseval 2)

Introduction • Work presented here: Test quality of 4 clustering methods • 2 based on distributional similarity • Confusion matrix of Senseval-2 systems • Translation equivalencies • Result: hierarchical cluster • Clustering algorithms: CLUTO toolkit • Evaluation: Senseval-2 coarse-grained senses

Clustering toolkit used • CLUTO (Karypis 2001) • Possible inputs: • context vector for each word sense (corpora) • similarity matrix (built from any source) • Number of clustering parameters • Output: • hierarchical or flat clusters

Distributional similarity methods • Hypothesis: two word senses are similar if they are used in similar contexts • Clustering directly over the examples • Clustering over similarity among topic signatures

Clustering directly from examples • Take examples from tagged data (Senseval 2) OR retrieve sense-examples from the web • E.g. if we want examples of first sense of channel use examples of monosemous synonym: transmision channel • We use: synonyms, hypernyms, all hyponyms, siblings • 1000 snippets for each monosemous term from Google • Resource freely available (contact us) • Cluster the examples as if they were documents

2. Clustering over similarity among TS • Retrieve the examples • Build topic signatures: vector of words in context of word sense, with high weights for distinguished words:1. sense: channel, transmission_channel "a path over which electrical signals can pass "medium(3110.34) optic(2790.34) transmission(2547.13) electronic(1553.85) channel(1352.44) mass(1191.12) fiber(1070.28) public(831.41) fibre(716.95) communication(631.38) technology(368.66) system(363.39) datum(308.50) ... • Build similarity matrix of TS • Cluster

3. Confusion matrix method • Hypothesis: sense A is similar to sense B if many WSD algorithms tag occurrences of A as B • Implemented using results from all Senseval-2 systems 4. Translation similarity method • Hypothesis: two word senses are similar if they are translated in the same way in a number of languages • (Resnik & Yarowsky, 2000) • Similarity matrix kindly provided by Chugur & Gonzalo (2002)

Experiment and results: by method Method purity • Best results for distributional similarity: • Topic signatures from web data Random 0.748 Confusion Matrixes 0.768 Multilingual Similarity 0.799 (Worse) 0.744 TSSenseval (Best) 0.806 TS Web (Worse) 0.764 (Best) 0.840

word by word

Conclusions • Meaningful hierarchical clusters • For all WordNet nominal synsets (soon) • Using Web data and distributional similarity • All data freely available (MEANING) But... • Are the clusters useful for the detection of relations (homonymy, metonymy, metaphor, ...) among word senses? Which clusters? • Are the clusters useful for applications? • WSD (ongoing work) • MT, IR, CLIR, Dialogue • Which clusters?

Thank you!

An example of a Topic signature http://ixa3.si.ehu.es/cgi-bin/signatureak/signaturecgi.cgi Source: web examples using monosemous relatives 1. sense: channel, transmission_channel "a path over which electrical signals can pass "medium(3110.34) optic(2790.34) transmission(2547.13) electronic(1553.85) channel(1352.44) mass(1191.12) fiber(1070.28) public(831.41) fibre(716.95) communication(631.38) technology(368.66) system(363.39) datum(308.50) 5. sense: channel, communication_channel, line "(often plural) a means of communication or access; "service(3360.26) postal(2503.25) communication(1868.81) mail(1402.33) communicate(1086.16) us(651.30) channel(479.36) communicating(340.82) united(196.55) protocol(170.02) music(165.93) london(162.61) drama(160.95) 7. sense: channel, television_channel, TV_channel "a television station and its programs; "station(24288.54) television(13759.75) tv(13226.62) broadcast(1773.82) local(1115.18) radio(646.33) newspaper(333.57) affiliated(301.73) programming(283.02) pb(257.88) own(233.25) independent(230.88)

Experiment and results: an example • Sample cluster built for channel: • Entropy: 0.286, Purity: 0.714.

1. Clustering directly from examples:Retrieving sense-examples from the web • Examples of word senses scarce • Alternative, automatically acquire examples from corpora (or web) • In this paper we follow the monosemous relative method (Leacock et al.1998) • E.g. if we want examples of first sense of channel use examples of monosemous synonym: transmision channel • We use: synonyms, hypernyms, all hyponyms, siblings • 1000 snippets for each monosemous term from Google • Heuristics to extract partial or full meaningful sentences • More details of the method in (Agirre et al. 2001)

2. Clustering over similarity among TS Building topic signatures • Given a set of examples for each word sense • ... build a vector for each word sense: each word in the vocabulary is a dimension • Steps: • Get frequencies for each word in context • Use 2 to assign weight to each word/dimension in contrast to the other word senses • Filtering step • More details of the method in (Agirre et al. 2001)

3. Confusion matrix method • Hypothesis: sense A is similar to sense B if WSD algorithms tag occurrences of A as B • Implemented using results from all Senseval-2 systems • Algorithm to produce similarity matrix: • M = number of systems • N(x) = number of occurrences of word sense x • n(a,b) = number of times sense a is tagged as b • confusion-similarity(a,b) = n(a,b) / N(a) * M • Not symmetric

4. Translation similarity method • Hypothesis: two word senses are similar if they are translated in the same way in a number of languages • (Resnik & Yarowsky, 2000) • Similarity matrix kindly provided by Chugur & Gonzalo (2002) • Simplified algorithm: • L = languages (= 4) • n(a,b) = number of languages where a and b share a translation • similarity(a,b) = n(a,b)/L • Actual formula is more elaborate

Previous work on WordNet clustering Use of WordNet structure: • Peters et al. 1998: WordNet hierarchy, try to identify systematic polysemy • Tomuro 2001: WordNet hierarchy (MDL), try to identify systematic polysemy (60% precision against WordNet cousins, increase in inter-tagger agreement) • Our proposal does not look for systematic polysemy. We get individual relations among word senses: • e.g. television channel and transmission channel • Mihalcea & Moldovan 2001: heuristics on WordNet, WSD improvement (Polysemy reduction 26%, error 2.1% in Semcor) • Provide complementary information

Previous work (continued) • Resnik & Yarowsky 2000: (also Chugur & Gonzalo (2002). Translations across different languages, improving evaluation metrics (very high correlation with Hector sense hierarchies). • We only get 80% purity using (Chugur & Gonzalo). Unfortunately the dictionaries are rather different (Senseval-2 results dropped compared to Senseval-1). Difficult to scale to all words. • Pantel & Lin (2002): induce word senses using soft clustering of word occurrences (overlap with WordNet over 60% prec.) • Use syntactic dependencies rather than bag-of-words vector • Palmer et al. (submitted): criteria for grouping verb senses.

Clustering Word Senses

Clustering Word Senses

Presentation Transcript

Somatic Senses Special Senses

Senses

Adam Kilgarriff doesn’t believe in word senses….

Senses

SENSES

Senses

Senses!

Senses

Senses

Senses

SENSES

Senses

Senses

Senses

Senses

senses

Senses

Using Word Based Features for Word Clustering

Senses

Finding Predominant Word Senses in Untagged Text

Senses

Discriminating Word Senses Using McQuitty’s Similarity Analysis