280 likes | 647 Views
WordNet: An Overview. Anubhav Madan anubhavm@comp.nus.edu.sg. Today’s Discussion. WordNet: A Lexical Database WordNet::Similarity Some More Applications Limitations Tutorial. WordNet: A Lexical Database. Started in 1985 Basic Unit: Synset Hierarchical arrangement w/r/t definition
E N D
WordNet: An Overview Anubhav Madan anubhavm@comp.nus.edu.sg - WordNet - Anubhav Madan
Today’s Discussion • WordNet: A Lexical Database • WordNet::Similarity • Some More Applications • Limitations • Tutorial - WordNet - Anubhav Madan
WordNet: A Lexical Database • Started in 1985 • Basic Unit: Synset • Hierarchical arrangement w/r/t definition • Contains compounds phrasal verbs, collocations, and idiomatic phrases • {Bad Person} @ {offender, libertine} • Establishes a rich, dense network and establishes text coherence - WordNet - Anubhav Madan
WordNet: The Facts • A word or phrase is the basic unit • Words are organized into synsets, which are a group of units that have the same sense. • A gloss is a textual definition of the synset • Words organized into hierarchies • hypernym/hyponym {concept} IS-A {concept} • meronym/holonym {concept} HAS-PART {concept} • Types: Nouns, Verbs, Adjectives • 80,000 Nouns organized into 60,000 concepts - WordNet - Anubhav Madan
Lexicographers X-Windows Application 1 Application 2 Lexical Source Files The WordNet Database Application 3 Grinder Application 4 Application N WordNet: Architecture - WordNet - Anubhav Madan
WordNet: Architecture • Word/synset pairs stored in the WordNet DB. • {Word/List of Word Forms, Pointer to Lexical File, frames (for verbs), list of elements, (optional gloss), adjective cluster} • {apple, edible_fruit,@ (fruit with red or yellow or green skin and crisp whitish flesh) } • Indexes: Senses are Ordered • Index of Familarity – How well known is the word. • Index and Data Files • Sense Index • The Grinder as a Converter: takes Lexical Source Files written by Lexiographers and converts them into a format that is understandable and updatable for WN. - WordNet - Anubhav Madan
Today’s Discussion • WordNet: A Lexical Database • WordNet::Similarity • Some More Applications • Limitations • Tutorial - WordNet - Anubhav Madan
WordNet::Similarity • An application measuring “closeness” of concepts in terms of their definitions • Main categories of measures: • Path based • Depth based • Information Content Based • Gloss Based - WordNet - Anubhav Madan
WordNet: Similarity Measures • Path Finder • Depth Finder • Wup (Wu and Palmer): Shortest path by scaling sum of values b/w node, root • Lch: (Leacock and Chodrow) Shortest path by scaling the max path • Path: Inverse of the Shortest Path measures • Information Content Finder • Resnik: Max Distance b/w concepts of both words • Jcn (Jiang and Conrath): Inverses the difference between Sum and LCS • Lin: Scales LCS IC with the description • Gloss Finder • Lesk (Banerjee and Pederson): Finds and scores overlaps between glosses • Vector (Padwardhan): Creates a co-occurrence matrix with glosses in vectors • Hso (Hirst and St-Onge): Specifies Direction between Words Demo - WordNet - Anubhav Madan
Root LCH 2 D=5 Medium of Exchange 1 1 Money Credit 1 1 Cash Credit Card 1 Coin Lch Related (Money-Credit) = -log (2/10) = 0.70 - WordNet - Anubhav Madan
Root WUP 2 D=5 Medium of Exchange 1 1 Money Credit 1 1 Cash Credit Card 1 Coin Wup ConSim (Money-Credit) = 4/6 = 0.67 - WordNet - Anubhav Madan
Root Path 2 D=5 Medium of Exchange • Inverse of the ShortestPath Measures 1 1 Money Credit 1 1 Cash Credit Card 1 Path (Money-Credit) = 1/ min[0.70, 0.67] = 1/0.67 = 1.5 Coin - WordNet - Anubhav Madan
6/6 3/6 2/6 2/6 1/6 1/6 Resnik Medium of Exchange Money Credit Cash Credit Card Coin Resnik Sim (Money-Credit) = -log (3/6) = 0.30 - WordNet - Anubhav Madan
6/6 3/6 2/6 2/6 1/6 1/6 Lin Medium of Exchange Money Credit Cash Credit Card Coin Lin Sim (Money-Credit) = log (6/6 – 3/6) = 0.30 - WordNet - Anubhav Madan
6/6 3/6 2/6 2/6 1/6 1/6 JCN Medium of Exchange Money Credit Cash Credit Card Jcn Dist (Money-Coin) = log (3/6) + log (2/6) – 2*log(6/6) = 0.301 + 0.477 = 0.878 Coin - WordNet - Anubhav Madan
Lesk - WordNet - Anubhav Madan
Vector - WordNet - Anubhav Madan
HSO • Classfies the relations in WordNet as having directions. • The Is-a relations are upwards. The has-part are horizontal. • Establishes a relationship b/w words through a path that is neither too long nor changes direction very often. - WordNet - Anubhav Madan
Demo - WordNet - Anubhav Madan
Today’s Discussion • WordNet: A Lexical Database • WordNet::Similarity • Some More Applications • Limitations • Tutorial - WordNet - Anubhav Madan
Applications • Building Semantic Concordances • Performance and Confidence in a Semantic Annotation Resnik Similarity Measure in Class Based Probabilities • Lch WordNet Similarity Measure in Word Sense Identification • Text Retrieval using Wordnet - WordNet - Anubhav Madan
Applications • Lexical Chains as Representations of Context for the Detection of Correction of Malapropisms • Temporal Indexing through Lexical Chaining • COLOR-X • Knowledge Processing on an Extended WordNet - WordNet - Anubhav Madan
Further Speculation • Sense Disambiguation • Information Retrieval • Semantic Relations and Textual Coherence • Knowledge engineering - WordNet - Anubhav Madan
The Limitations • Relation IS-NOT or NOT-A-KIND-OF is inexpressible • Relation IS-USED-AS-A-KIND-OF is also inexpressible • No Explicit Distinction between Proper and Common Nouns – It was too difficult to include this information • Does not attempt to identify “basic-level” or “generic” categories. For the concepts in the middle of the lexical hierarchy, there can be many listed features that can identify the differences between words. WordNet doesn’t support this. • Not enough semantic relations in Wordnet. - WordNet - Anubhav Madan
Tutorial • What is WordNet? • Why is WordNet unique? • What is the difference between WordNet and WordNet::Similarity • What are some of the limiting features? • Give an example of a human scenario, where WordNet would be instrumental - WordNet - Anubhav Madan
Tutorial • What Similarity measure would you use if you had only the following information: • Path [linkages between words in an ontology] • Information Content of the Words • Gloss of the Words • An ontology with direction - WordNet - Anubhav Madan
References • Overview: Pedersen, Ted and Patwardhan, Siddharth, and Michelizzi, Jason "WordNet::Similarity - Measuring the Relatedness of Concepts" In: Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-04), pp. 38-41, Boston, May 2004. • Lch: Leacock, C., and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In Fellbaum, C., ed., WordNet: An electronic lexical database. MIT Press. 265–283. • Wup: Wu, Z., and Palmer, M. 1994. Verb semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational Linguistics, 133–138. • Res: Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 448–453. • Lin: Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the International Conference on Machine Learning. • Jcn: Jiang, J., and Conrath, D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings on International Conference on Research in Computational Linguistics, 19–33. • Hso: Hirst, G., and St-Onge, D. 1998. Lexical chains as representations of context for the detection and correction of malapropisms. In Fellbaum, C., ed., WordNet: An electronic lexical database. MIT Press. 305–332. • Lesk: Banerjee, S., and Pedersen, T. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, 805–810. • Vector: Patwardhan, S. 2003. Incorporating dictionary and corpus information into a context vector measure of semantic relatedness. Master’s thesis, Univ. of Minnesota, Duluth. • Links availiable at: http://www.comp.nus.edu.sg/~anubhavm/reading.htm - WordNet - Anubhav Madan
Thank You Anubhav Madan anubhavm@comp.nus.edu.sg - WordNet - Anubhav Madan