120 likes | 242 Views
Machine Translation, Digital Libraries, and the Computing Research Laboratory. Indo-US Workshop on Digital Libraries June 23, 2003. The Computing Research Laboratory (CRL). New Mexico State University Las Cruces, New Mexico http://crl.nmsu.edu Stephen Helmreich (505) 646-2141
E N D
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003
The Computing Research Laboratory (CRL) New Mexico State University Las Cruces, New Mexico http://crl.nmsu.edu Stephen Helmreich (505) 646-2141 Shelmrei@crl.nmsu.edu
Machine Translation (MT) • Component technologies • Comparable technologies • Composed technologies
MT--Purposes • Dissemination (high quality) sublanguages, controlled languages • Assimilation (broad coverage) • Communication (speed)
MT -- Types • Direct – string-for-string • Transfer – structure-for-structure • Interlingual – to and from a meaning representation • Statistical – most probable translation given a corpus
Component technologies -- I • Character encoding and representation, text editing (Unicode) • Text segmenting (OCR, sandhi?) • Morphological analysis • Lexical annotation (part of speech tagging, proper name identification, others)
Component technologies -- II • Syntactic analyzers (grammars, parsers) • Bilingual/multilingual dictionaries • Ontologies (WordNet, OntoSem, Cyc)(lexical, linguistic, world-knowledge) • Generation systems
Comparable technologies • Information Retrieval (IE) (URSA) • Information Extraction (IR) (MUC) • Text Summarization (DUC) • Word Sense Disambiguation (SensEval) • Cross-Document Named Entity Identification (Coreference Resolution)
Composed Technologies • All of the above (IR/IE/Summarization) • multi-lingual • multi-modal • with attention to human-computer interaction (HCI)
Composed technologies -- II • Personal Profiler – searches the web to find information about a particular person, translates it if appropriate, and organizes in temporal order • Quick Ramp-up MT (Expedition) – allows a non-linguist language user and a computer expert to construct a simple MT system
Question-Answering Systems • Advanced Question and Answering for Intelligence (AQUAINT) • MOQA – Meaning-Oriented Question Answering • Allows user to pose structured or natural language queries, obtains answer from a variety of sources, and presents the answer appropriately
Summary • Choose an appropriate purpose and type • Look at related technologies: component, comparable, composed • Search for an appropriate research partner