180 likes | 192 Views
Explore the creation of lexicons, ontologies, and mindnets for computational applications like word sense disambiguation, text summarization, and speech recognition. Understand the process of creating lexicons and the importance of ontologies in language processing.
E N D
Lexicons, Concept Networks, and Ontologies Kevin Bloomquist Dan Pratt
What is a Lexicon? • A general term • Simple word lists (base word, POS) • Wordnets (related words & other info as well depending on project) • Ontologies (hierarchical forms) • At a bare minimum, it contains a dictionary in some machine readable format • Entire field of Computational Lexicology
General Applications of Lexicons • Word sense disambiguation: SENSEVAL • Use of unsupervised systems • Pattern matching in Information Extraction • Categorize words by what syntactic information they convey • Question Answering • Use of keywords • Use of ontologies in text summarization • Speech recognition/synthesis
How are Lexicons Created? • First created from already existing dictionaries that are made machine readable • Lexicons can be added to with derived information from corpuses • Statistical information, etc. • Human input • The intended use highly influences how the lexicon is organized and what information it conveys
Ontologies • Semantic relations between words • Example below for the word “oxygenate” • Uses information about word roots and definitions to create the graph. • This graph creates “definition cycles” • This is just one example… There are many ways to create an ontology
Mindnet • An application that finds relations between arbitrary sets of words • Uses definitions to find different types of relations between words, such as synonym, antonym, goal, part, object, and subject • Attempts to construct logical relations using a lexical database • http://atom.research.microsoft.com/mnex/InputPath.aspx?l=e&d=d • Now part of Microsoft, next step is working on machine translation
ACQUILEX I & II • Overall Goal: Develop a rich multilingual knowledge base • Want to “support a ‘deep’ knowledge-intensive model of language processing.” • I: Explore creating a multilingual dictionary out of a number of machine readable dictionaries • Some were monolingual, some bilingual • II: Add to this by using statistical information from corpuses • Ended up publishing a large number of academic papers (most of which are highly specific or immediately inaccessible)
Overall Insights • One of the main problems with building lexicons is each project develops its own format and chooses the information required. WordNet is changing this. • Building good ontologies may be the next important step, but there may be other (better/easier) ways
WordNet Online • A field to type in a word • Eight options that can be displayed or hidden • Every definition has related words and you can view there definition.
Example of Online Hybrid Hybridize
How does WordNet work? • WordNet is a large database containing words and their definitions. • Also it contains mapping between words, like synonyms and antonyms. • It can tell how common or rare a word is in a particular sense.
What is not covered by WordNet? • WordNet does not include any closed set of words. • That means no pronouns, articles, conjunctions, prepositions, etc. • The only types are nouns, verbs, adjectives and adverbs.
Example of how WordNet stores a word. • Index.sense: hybrid%1:05:00:: 01310936 3 0 hybrid%1:09:00:: 05796358 2 0 hybrid%1:10:00:: 06210172 1 0 hybrid%5:00:00:crossbred:00 01973272 1 0
Example cont. • Index.noun:hybrid n 3 4 @ ~ + ; 3 0 06210172 05796358 01310936 • Index.adj:hybrid a 1 1 & 1 0 01973272 • Data.adj:01973272 00 s 04 crossed 0 hybrid 0 interbred 0 intercrossed 0 001 & 01972954 a 0000 | produced by crossbreeding
Example cont. • Data.noun:06210172 10 n 03 loanblend 0 loan-blend 0 hybrid 0 003 @ 06203456 n 0000 ;r 08657546 n 0000 ;c 06868465 n 0000 | a word that is composed of parts from different languages (e.g., `monolingual' has a Greek prefix and a Latin root) 05796358 09 n 01 hybrid 0 002 @ 05796126 n 0000 + 01417728 v 0103 | a composite of mixed origin; "the vice-presidency is a hybrid of administrative and legislative offices“ 01310936 05 n 03 hybrid 0 crossbreed 0 cross 0 007 @ 00004576 n 0000 + 01417728 v 0302 + 01417728 v 0201 + 01417728 v 0103 ~ 01311349 n 0000 ~ 01311480 n 0000 ~ 01311624 n 0000 | an organism that is the offspring of genetically dissimilar parents or stock; especially offspring produced by breeding plants or animals of different varieties or breeds or species; "a mule is a cross between a horse and a donkey"
Example of Download Hybrid
References • ACQUILEX I and II.http://www.cl.cam.ac.uk/Research/NL/acquilex/acqhome.html • Sponsored by the European Commission, centered at University of Cambridge • Last access: 01/25/06 • Litowski, Kenneth C. Computational Lexicons and Dictionaries.http://www.clres.com/online-papers/ell.doc • Part of CL Research • Last access: 01/25/06 • Dolan, et. al. Mindnet. http://research.microsoft.com/nlp/Projects/MindNet.aspx • Microsoft Research • Last access: 01/25/06 • WordNet. http://wordnet.princeton.edu/ • Princeton University • Last access: 01/25/06