1 / 18

Multilingual Lexicon Development and Applications

Explore the creation of lexicons, ontologies, and mindnets for computational applications like word sense disambiguation, text summarization, and speech recognition. Understand the process of creating lexicons and the importance of ontologies in language processing.

fcain
Download Presentation

Multilingual Lexicon Development and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lexicons, Concept Networks, and Ontologies Kevin Bloomquist Dan Pratt

  2. What is a Lexicon? • A general term • Simple word lists (base word, POS) • Wordnets (related words & other info as well depending on project) • Ontologies (hierarchical forms) • At a bare minimum, it contains a dictionary in some machine readable format • Entire field of Computational Lexicology

  3. General Applications of Lexicons • Word sense disambiguation: SENSEVAL • Use of unsupervised systems • Pattern matching in Information Extraction • Categorize words by what syntactic information they convey • Question Answering • Use of keywords • Use of ontologies in text summarization • Speech recognition/synthesis

  4. How are Lexicons Created? • First created from already existing dictionaries that are made machine readable • Lexicons can be added to with derived information from corpuses • Statistical information, etc. • Human input • The intended use highly influences how the lexicon is organized and what information it conveys

  5. Ontologies • Semantic relations between words • Example below for the word “oxygenate” • Uses information about word roots and definitions to create the graph. • This graph creates “definition cycles” • This is just one example… There are many ways to create an ontology

  6. Mindnet • An application that finds relations between arbitrary sets of words • Uses definitions to find different types of relations between words, such as synonym, antonym, goal, part, object, and subject • Attempts to construct logical relations using a lexical database • http://atom.research.microsoft.com/mnex/InputPath.aspx?l=e&d=d • Now part of Microsoft, next step is working on machine translation

  7. ACQUILEX I & II • Overall Goal: Develop a rich multilingual knowledge base • Want to “support a ‘deep’ knowledge-intensive model of language processing.” • I: Explore creating a multilingual dictionary out of a number of machine readable dictionaries • Some were monolingual, some bilingual • II: Add to this by using statistical information from corpuses • Ended up publishing a large number of academic papers (most of which are highly specific or immediately inaccessible)

  8. Overall Insights • One of the main problems with building lexicons is each project develops its own format and chooses the information required. WordNet is changing this. • Building good ontologies may be the next important step, but there may be other (better/easier) ways

  9. WordNet

  10. WordNet Online • A field to type in a word • Eight options that can be displayed or hidden • Every definition has related words and you can view there definition.

  11. Example of Online Hybrid Hybridize

  12. How does WordNet work? • WordNet is a large database containing words and their definitions. • Also it contains mapping between words, like synonyms and antonyms. • It can tell how common or rare a word is in a particular sense.

  13. What is not covered by WordNet? • WordNet does not include any closed set of words. • That means no pronouns, articles, conjunctions, prepositions, etc. • The only types are nouns, verbs, adjectives and adverbs.

  14. Example of how WordNet stores a word. • Index.sense: hybrid%1:05:00:: 01310936 3 0 hybrid%1:09:00:: 05796358 2 0 hybrid%1:10:00:: 06210172 1 0 hybrid%5:00:00:crossbred:00 01973272 1 0

  15. Example cont. • Index.noun:hybrid n 3 4 @ ~ + ; 3 0 06210172 05796358 01310936 • Index.adj:hybrid a 1 1 & 1 0 01973272 • Data.adj:01973272 00 s 04 crossed 0 hybrid 0 interbred 0 intercrossed 0 001 & 01972954 a 0000 | produced by crossbreeding

  16. Example cont. • Data.noun:06210172 10 n 03 loanblend 0 loan-blend 0 hybrid 0 003 @ 06203456 n 0000 ;r 08657546 n 0000 ;c 06868465 n 0000 | a word that is composed of parts from different languages (e.g., `monolingual' has a Greek prefix and a Latin root) 05796358 09 n 01 hybrid 0 002 @ 05796126 n 0000 + 01417728 v 0103 | a composite of mixed origin; "the vice-presidency is a hybrid of administrative and legislative offices“ 01310936 05 n 03 hybrid 0 crossbreed 0 cross 0 007 @ 00004576 n 0000 + 01417728 v 0302 + 01417728 v 0201 + 01417728 v 0103 ~ 01311349 n 0000 ~ 01311480 n 0000 ~ 01311624 n 0000 | an organism that is the offspring of genetically dissimilar parents or stock; especially offspring produced by breeding plants or animals of different varieties or breeds or species; "a mule is a cross between a horse and a donkey"

  17. Example of Download Hybrid

  18. References • ACQUILEX I and II.http://www.cl.cam.ac.uk/Research/NL/acquilex/acqhome.html • Sponsored by the European Commission, centered at University of Cambridge • Last access: 01/25/06 • Litowski, Kenneth C. Computational Lexicons and Dictionaries.http://www.clres.com/online-papers/ell.doc • Part of CL Research • Last access: 01/25/06 • Dolan, et. al. Mindnet. http://research.microsoft.com/nlp/Projects/MindNet.aspx • Microsoft Research • Last access: 01/25/06 • WordNet. http://wordnet.princeton.edu/ • Princeton University • Last access: 01/25/06

More Related