220 likes | 361 Views
IxA NLP group http://ixa.si.ehu.es. Clustering Word Senses. Eneko Agirre, Oier Lopez de Lacalle. Introduction: motivation. Desired grained of word sense distinctions controversial Fine-grainedness of word senses unnecessary for some applications MT: channel (tv, strait) kanal
E N D
IxA NLP group http://ixa.si.ehu.es Clustering Word Senses Eneko Agirre, Oier Lopez de Lacalle
Introduction: motivation • Desired grained of word sense distinctions controversial • Fine-grainedness of word senses unnecessary for some applications • MT: channel (tv, strait) kanal • Senseval-2 WSD competition also provides coarse-grained senses • The desired sense groupings depend on the application: • MT: same translation (language pair dependant) • IR: some related senses: metonymic, diathesis, specialization • Dialogue (deeper NLP): in principle, all word senses in order to do proper inferences • WSD needs to be tuned, multiple senses returned • Clustering of word senses
Introduction: a sample word • Channel has 7 senses and 4 coarse-grained senses (Senseval 2)
Introduction • Work presented here: Test quality of 4 clustering methods • 2 based on distributional similarity • Confusion matrix of Senseval-2 systems • Translation equivalencies • Result: hierarchical cluster • Clustering algorithms: CLUTO toolkit • Evaluation: Senseval-2 coarse-grained senses
Clustering toolkit used • CLUTO (Karypis 2001) • Possible inputs: • context vector for each word sense (corpora) • similarity matrix (built from any source) • Number of clustering parameters • Output: • hierarchical or flat clusters
Distributional similarity methods • Hypothesis: two word senses are similar if they are used in similar contexts • Clustering directly over the examples • Clustering over similarity among topic signatures
Clustering directly from examples • Take examples from tagged data (Senseval 2) OR retrieve sense-examples from the web • E.g. if we want examples of first sense of channel use examples of monosemous synonym: transmision channel • We use: synonyms, hypernyms, all hyponyms, siblings • 1000 snippets for each monosemous term from Google • Resource freely available (contact us) • Cluster the examples as if they were documents
2. Clustering over similarity among TS • Retrieve the examples • Build topic signatures: vector of words in context of word sense, with high weights for distinguished words:1. sense: channel, transmission_channel "a path over which electrical signals can pass "medium(3110.34) optic(2790.34) transmission(2547.13) electronic(1553.85) channel(1352.44) mass(1191.12) fiber(1070.28) public(831.41) fibre(716.95) communication(631.38) technology(368.66) system(363.39) datum(308.50) ... • Build similarity matrix of TS • Cluster
3. Confusion matrix method • Hypothesis: sense A is similar to sense B if many WSD algorithms tag occurrences of A as B • Implemented using results from all Senseval-2 systems 4. Translation similarity method • Hypothesis: two word senses are similar if they are translated in the same way in a number of languages • (Resnik & Yarowsky, 2000) • Similarity matrix kindly provided by Chugur & Gonzalo (2002)
Experiment and results: by method Method purity • Best results for distributional similarity: • Topic signatures from web data Random 0.748 Confusion Matrixes 0.768 Multilingual Similarity 0.799 (Worse) 0.744 TSSenseval (Best) 0.806 TS Web (Worse) 0.764 (Best) 0.840
Conclusions • Meaningful hierarchical clusters • For all WordNet nominal synsets (soon) • Using Web data and distributional similarity • All data freely available (MEANING) But... • Are the clusters useful for the detection of relations (homonymy, metonymy, metaphor, ...) among word senses? Which clusters? • Are the clusters useful for applications? • WSD (ongoing work) • MT, IR, CLIR, Dialogue • Which clusters?
An example of a Topic signature http://ixa3.si.ehu.es/cgi-bin/signatureak/signaturecgi.cgi Source: web examples using monosemous relatives 1. sense: channel, transmission_channel "a path over which electrical signals can pass "medium(3110.34) optic(2790.34) transmission(2547.13) electronic(1553.85) channel(1352.44) mass(1191.12) fiber(1070.28) public(831.41) fibre(716.95) communication(631.38) technology(368.66) system(363.39) datum(308.50) 5. sense: channel, communication_channel, line "(often plural) a means of communication or access; "service(3360.26) postal(2503.25) communication(1868.81) mail(1402.33) communicate(1086.16) us(651.30) channel(479.36) communicating(340.82) united(196.55) protocol(170.02) music(165.93) london(162.61) drama(160.95) 7. sense: channel, television_channel, TV_channel "a television station and its programs; "station(24288.54) television(13759.75) tv(13226.62) broadcast(1773.82) local(1115.18) radio(646.33) newspaper(333.57) affiliated(301.73) programming(283.02) pb(257.88) own(233.25) independent(230.88)
Experiment and results: an example • Sample cluster built for channel: • Entropy: 0.286, Purity: 0.714.
1. Clustering directly from examples:Retrieving sense-examples from the web • Examples of word senses scarce • Alternative, automatically acquire examples from corpora (or web) • In this paper we follow the monosemous relative method (Leacock et al.1998) • E.g. if we want examples of first sense of channel use examples of monosemous synonym: transmision channel • We use: synonyms, hypernyms, all hyponyms, siblings • 1000 snippets for each monosemous term from Google • Heuristics to extract partial or full meaningful sentences • More details of the method in (Agirre et al. 2001)
2. Clustering over similarity among TS Building topic signatures • Given a set of examples for each word sense • ... build a vector for each word sense: each word in the vocabulary is a dimension • Steps: • Get frequencies for each word in context • Use 2 to assign weight to each word/dimension in contrast to the other word senses • Filtering step • More details of the method in (Agirre et al. 2001)
3. Confusion matrix method • Hypothesis: sense A is similar to sense B if WSD algorithms tag occurrences of A as B • Implemented using results from all Senseval-2 systems • Algorithm to produce similarity matrix: • M = number of systems • N(x) = number of occurrences of word sense x • n(a,b) = number of times sense a is tagged as b • confusion-similarity(a,b) = n(a,b) / N(a) * M • Not symmetric
4. Translation similarity method • Hypothesis: two word senses are similar if they are translated in the same way in a number of languages • (Resnik & Yarowsky, 2000) • Similarity matrix kindly provided by Chugur & Gonzalo (2002) • Simplified algorithm: • L = languages (= 4) • n(a,b) = number of languages where a and b share a translation • similarity(a,b) = n(a,b)/L • Actual formula is more elaborate
Previous work on WordNet clustering Use of WordNet structure: • Peters et al. 1998: WordNet hierarchy, try to identify systematic polysemy • Tomuro 2001: WordNet hierarchy (MDL), try to identify systematic polysemy (60% precision against WordNet cousins, increase in inter-tagger agreement) • Our proposal does not look for systematic polysemy. We get individual relations among word senses: • e.g. television channel and transmission channel • Mihalcea & Moldovan 2001: heuristics on WordNet, WSD improvement (Polysemy reduction 26%, error 2.1% in Semcor) • Provide complementary information
Previous work (continued) • Resnik & Yarowsky 2000: (also Chugur & Gonzalo (2002). Translations across different languages, improving evaluation metrics (very high correlation with Hector sense hierarchies). • We only get 80% purity using (Chugur & Gonzalo). Unfortunately the dictionaries are rather different (Senseval-2 results dropped compared to Senseval-1). Difficult to scale to all words. • Pantel & Lin (2002): induce word senses using soft clustering of word occurrences (overlap with WordNet over 60% prec.) • Use syntactic dependencies rather than bag-of-words vector • Palmer et al. (submitted): criteria for grouping verb senses.