320 likes | 698 Views
WordNet and Extended WordNet. Sriram Rajaraman. Objective. Introduce the idea of an semantic lexicon ontology, especially WordNet and eXtended WordNet. Focus. Introduction WordNet eXtended WordNet Summary. Reference. WordNet: http://wordnet.princeton.edu/
E N D
WordNet and Extended WordNet Sriram Rajaraman
Objective • Introduce the idea of an semantic lexicon ontology, especially WordNet and eXtended WordNet University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Reference • WordNet: http://wordnet.princeton.edu/ • eXtended WordNet: http://xwn.hlt.utdallas.edu/ • Christiane Fellbaum,MIT ,”WordNet : an electronic lexical database”, MIT Press, 1999, c1998. • George A. Miller, Richard Beckwith, Christiane Fellbaum,Derek Gross, and Katherine Miller, “Introduction to WordNet: An On-line Lexical Database”, core working paper • Rada Mihalcea, Dan I. Moldovan,” eXtended WordNet: progress report ” Proceedings of NAACL Workshop on WordNet and Other Lexical Resources , 2001 • Sanda M. Harabagiu, George A. Miller, Dan I. Moldovan, “WordNet 2 - A Morphologically and Semantically Enhanced Resource”, SIGLEX 1999 University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Introduction • Traditional Dictionary • What is available: • spelling • pronunciation • inflected and derivative forms • etymology • part of speech • definitions • illustrative uses of alternative senses • synonyms and antonyms • special usage notes University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
TreeRef: http://www.merriam-webster.com/dictionary/Tree • Main Entry: tree • Pronunciation: \ˈtrē\ • Function: noun • Etymology: Middle English, from Old English trēow; akin to Old Norse trē tree, Greek drys, Sanskrit dāru wood • Date: before 12th century • - a woody perennial plant having a single usually elongate main stem generally with few or no branches on its lower part University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Drawback of traditional dictionary • What is missing: • It does not say, for example, that trees have roots, or that they consist of cells having cellulose walls, or even that they are living organisms • “Sense” of the super ordinate term aka hypernym (living plant or industrial plant) • Coordinate terms (bushes, shrubs, …) • Hyponyms - types of trees (pine, tropical,deciduous..) • Information assumed to be known to everyone ( trees have barks and leaves, they grow from seeds, they make their own food by photosynthesis- probably information for encyclopedia!) University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
How can we improve ? • The missing information is structural – every word points upwards to its super-ordinate (hypernym), but not sideward to its co-ordinates or downward to the hyponym. • Restriction due to alphabetical ordering, budget and size constraints- which can be overcome in an electronic lexical database University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
What is WordNet? • WordNet is a lexical database for the English language. • WordNet 3.0 has [1]: • – 117,097 nouns (average noun has 1.23 senses) • – 11,488 verbs (average verb has 2.16 sense) • – 22,141 adjectives • – 4,601 adverbs • Created and maintained at the Cognitive Science Laboratory of Princeton University • Accessible online @ http://wordnetweb.princeton.edu/perl/webwn (Also Downloadable) • Interfaces available in , c, dot Net , java, perl, php, python, sql etc..(JWNL, WordNet.Net, RTiA wordNet, pywordne ..) University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
WordNet Structure • Words are organized as synsets in WordNet • There are four disjoint kinds of synsets, containing either • Nouns • verbs • Adjectives • Adverbs University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
What is a synset? • Basic unit of WordNet • A group of synonymous words which refer to a common semantic concept • Words may belong to more than one synset – first sense is the most frequent sense • Words also include collocations (“eye contact’, “mix up”) • Example University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Synset example • “car” as in • {car, auto, automobile, machine, motorcar} • {car, railcar, railway car, railroad car}. • “Chocolate” as in- University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
How are synsets related? • A list of pointers associated with each sysnet to express the relationship between synsets • WordNet defines 17 relations • 10 between synsets • 5 between wordsense • "gloss" (between a synset and a sentence, i.e a textual definition for each synset) • "frame" (between a synset and a verb construction pattern) University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
WordNet relations University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Applications of WordNet • Information Extraction • Information Retreival • Question Answering • Word Sense Disambiguation • Text Inference • Coreference, coherence and metonymy • Knowledge acquisition • Internet Search engine University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Limitations of WordNet • Designed as a semantic lexicon, not a knowledge base • Limited connections between topically related words • Lack of morphological relationship(special algorithm does that) • Lack of selectional restriction • And more…. [6] University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
eXtended WordNet[2] • A project at the Human Language Technology Research Institute , at The University of Texas at Dallas(http://xwn.hlt.utdallas.edu) • Provides several important enhancements (over WordNet2.0) intended to remedy the present limitations of WordNet • Current Version: eXtended WordNet 2.0 (xwn 2.0-1.1) University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Objective of eXtended WordNet • Exploit the rich information, available in synset glosses (gloss is a sentence, i.e a textual definition for each synset) • Semantic and logical enhancements to WordNet • Increase the connectivity among the synsets by at least one order of magnitude • Enable access to a broader context for each concept University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
What eXtended WordNet does?[5] • Preprocessing and Parsing • Separation of glosses into definition and examples, tokenization and identification of compound words • Word Sense Disambiguation • All words in a gloss is tagged with appropriate senses and linked to corresponding synsets • Logical Form Transformation • Gloss Logical Forms • Topical Relations • Connections are established between the words, based on the context/topic University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
“Tennis court: A court on which tennis is played.” def location-of tennis court court play object tennis {“tennis”, “lawn tennis”} Extended WordNet University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
eXtended WordNet format • Consists of four XML files--one for each part of speech: • Noun • Verb • Adjective • Adverb • The xml tags contains attributes that specify the relationships University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
eXtended WordNet- Applications • Core Knowledge Base for applications - • Question Answering • Information Retrieval • Information Extraction • Summarization • Natural Language Generation • Inferences • Other knowledge intensive applications University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Further Reading • W3C- RDF/OWL Representation of WordNet • http://www.w3.org/TR/wordnet-rdf/ • eXtended WordNet Format/algorithm • http://xwn.hlt.utdallas.edu/wsd.html • Current research at Princeton • http://wordnet.cs.princeton.edu/projects.html • Related Projects (APIs, Web Interface, Extension) • http://wordnet.princeton.edu/wordnet/related-projects/ University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science
Back up University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science