180 likes | 192 Views
Lexicons, Concept Networks, and Ontologies. Kevin Bloomquist Dan Pratt. What is a Lexicon?. A general term Simple word lists (base word, POS) Wordnets (related words & other info as well depending on project) Ontologies (hierarchical forms)
E N D
Lexicons, Concept Networks, and Ontologies Kevin Bloomquist Dan Pratt
What is a Lexicon? • A general term • Simple word lists (base word, POS) • Wordnets (related words & other info as well depending on project) • Ontologies (hierarchical forms) • At a bare minimum, it contains a dictionary in some machine readable format • Entire field of Computational Lexicology
General Applications of Lexicons • Word sense disambiguation: SENSEVAL • Use of unsupervised systems • Pattern matching in Information Extraction • Categorize words by what syntactic information they convey • Question Answering • Use of keywords • Use of ontologies in text summarization • Speech recognition/synthesis
How are Lexicons Created? • First created from already existing dictionaries that are made machine readable • Lexicons can be added to with derived information from corpuses • Statistical information, etc. • Human input • The intended use highly influences how the lexicon is organized and what information it conveys
Ontologies • Semantic relations between words • Example below for the word “oxygenate” • Uses information about word roots and definitions to create the graph. • This graph creates “definition cycles” • This is just one example… There are many ways to create an ontology
Mindnet • An application that finds relations between arbitrary sets of words • Uses definitions to find different types of relations between words, such as synonym, antonym, goal, part, object, and subject • Attempts to construct logical relations using a lexical database • http://atom.research.microsoft.com/mnex/InputPath.aspx?l=e&d=d • Now part of Microsoft, next step is working on machine translation
ACQUILEX I & II • Overall Goal: Develop a rich multilingual knowledge base • Want to “support a ‘deep’ knowledge-intensive model of language processing.” • I: Explore creating a multilingual dictionary out of a number of machine readable dictionaries • Some were monolingual, some bilingual • II: Add to this by using statistical information from corpuses • Ended up publishing a large number of academic papers (most of which are highly specific or immediately inaccessible)
Overall Insights • One of the main problems with building lexicons is each project develops its own format and chooses the information required. WordNet is changing this. • Building good ontologies may be the next important step, but there may be other (better/easier) ways
WordNet Online • A field to type in a word • Eight options that can be displayed or hidden • Every definition has related words and you can view there definition.
Example of Online Hybrid Hybridize
How does WordNet work? • WordNet is a large database containing words and their definitions. • Also it contains mapping between words, like synonyms and antonyms. • It can tell how common or rare a word is in a particular sense.
What is not covered by WordNet? • WordNet does not include any closed set of words. • That means no pronouns, articles, conjunctions, prepositions, etc. • The only types are nouns, verbs, adjectives and adverbs.
Example of how WordNet stores a word. • Index.sense: hybrid%1:05:00:: 01310936 3 0 hybrid%1:09:00:: 05796358 2 0 hybrid%1:10:00:: 06210172 1 0 hybrid%5:00:00:crossbred:00 01973272 1 0
Example cont. • Index.noun:hybrid n 3 4 @ ~ + ; 3 0 06210172 05796358 01310936 • Index.adj:hybrid a 1 1 & 1 0 01973272 • Data.adj:01973272 00 s 04 crossed 0 hybrid 0 interbred 0 intercrossed 0 001 & 01972954 a 0000 | produced by crossbreeding
Example cont. • Data.noun:06210172 10 n 03 loanblend 0 loan-blend 0 hybrid 0 003 @ 06203456 n 0000 ;r 08657546 n 0000 ;c 06868465 n 0000 | a word that is composed of parts from different languages (e.g., `monolingual' has a Greek prefix and a Latin root) 05796358 09 n 01 hybrid 0 002 @ 05796126 n 0000 + 01417728 v 0103 | a composite of mixed origin; "the vice-presidency is a hybrid of administrative and legislative offices“ 01310936 05 n 03 hybrid 0 crossbreed 0 cross 0 007 @ 00004576 n 0000 + 01417728 v 0302 + 01417728 v 0201 + 01417728 v 0103 ~ 01311349 n 0000 ~ 01311480 n 0000 ~ 01311624 n 0000 | an organism that is the offspring of genetically dissimilar parents or stock; especially offspring produced by breeding plants or animals of different varieties or breeds or species; "a mule is a cross between a horse and a donkey"
Example of Download Hybrid
References • ACQUILEX I and II.http://www.cl.cam.ac.uk/Research/NL/acquilex/acqhome.html • Sponsored by the European Commission, centered at University of Cambridge • Last access: 01/25/06 • Litowski, Kenneth C. Computational Lexicons and Dictionaries.http://www.clres.com/online-papers/ell.doc • Part of CL Research • Last access: 01/25/06 • Dolan, et. al. Mindnet. http://research.microsoft.com/nlp/Projects/MindNet.aspx • Microsoft Research • Last access: 01/25/06 • WordNet. http://wordnet.princeton.edu/ • Princeton University • Last access: 01/25/06