200 likes | 354 Views
AVENUE Automatic Machine Translation for low-density languages. Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University. 2 HCI project proposals. Interface to Online Bilingual and Multilingual Dictionaries
E N D
AVENUEAutomatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University
2 HCI project proposals • Interface to Online Bilingual and Multilingual Dictionaries • Translation Correction Tool interface: design, implementation and user studies
Online Bilingual and Multilingual Dictionaries • bilingual and multilingual dictionaries for indigenous languages (Mapudungun [Chile], Inupiaq [Alaska], Aymara, Quechua and Aguaruna [Peru]) • For each bilingual/multilingual dictionary, we (will) have an excel database created by the local teams (Mapudungun: from a spoken corpus transcribed and translated into Spanish)
Online Bilingual and Multilingual Dictionaries (cont.) For each entry, we give the translation in Spanish, some other linguistic information (POS), and a link to the actual sentence where it appears in the corpus. For example: Püñpüñkünuukey: se manifiesta en forma de ronchas nmlch-nmfhp1_x_0031_nmfhp_00 Mapu:Fey itrofillpüle kuerpu, ta pichike püñpüñkünuukey ta kalül may, peñi.Sp:Así es en todas partes del cuerpo, pequeñas ronchas se forman en el cuerpo pues, hermano
Online Bilingual and Multilingual Dictionaries (cont.) • Currently, users can search for: • Mapudungun words • Spanish words • all the words starting with a letter • all the words containing a word or a string of characters
Online Bilingual and Multilingual Dictionaries (cont.) • Primary users: • people in the indigenous communities • researchers in these countries, inside and outside the indigenous communities Chilean case: • product of the Ministry of Education. • students and teachers, mostly Mapuche, but maybe some Spanish users as well
Online Bilingual and Multilingual Dictionaries (cont.) • Secondary users • Linguistic, Lexicography and Anthropology researchers from all over the world • random people browsing the www
Online Dictionaries: Tasks for HCII project • analyze design of the basic web interface given a query for a word in either language, it presents the information for that entry to the user in the other language. • how to incorporate an audio file with the word as it was pronounced in the spoken corpus. • how to make it interactive, i.e. have bilingual users comment on the entries and possibly add new entries (need profile info)
Translation Correction Tool (TCTool) • AVENUE is a project which developsAutomatic Machine Translation Systems for low-density languages • Since translations are automatic, i.e. not perfect, we need to refine them. • instead of having a professional translator, we want to find an automatic way to refine the output of the MTS -> TCTool
TCTool • We can use the TCTool to automatically learn a refinement of the Transfer rules in our MTS, from users input • Challenges: • users most likely not familiar with computers -> user-friendly and Intuitive interface • bilingual informants can’t be assumed to have any linguistic knowledge
Automatic Machine Translation Interlingua interpretation Transfer rules Corpus-based methods generation analysis
Automatic Learning of a Transfer-based MTS tentative Transfer rules SVS algorithm Elicitation corpus Transfer module Rule Refinement module (tentative) TL sentences SL sentences
Interactive and Automatic rule refinement Interactive step (TCTool): Given an MTS, translate sentences and present them to the users for minimal correction (interface design, MT error classification) Automatic step: Machine learning DS and algorithms to map user input with refined transfer-rules
TCTool: Tasks for HCII project • analyze design of the basic web interface given a translated sentence, it asks the user to minimally correct it, if incorrect, and to classify the error(s). • how to explain what minimally correction is • what is the right error classification for non-expert and non-linguist users • Can naïve users reliably pinpoint the source of errors? • design User Studies to show reliability of user input (Spanish – English, English – Spanish, English – Chinese)
AVENUE project members LTI team: ResearchersPh. D. students Jaime Carbonell Ariadna Font Llitjós Lori Levin Christian Monson Alon LavieErik Peterson Ralf Brown Katharina Probst Avenue External Project Coordinator Rodolfo M Vega, Chilean team: Eliseo Cañulef Luis Caniupil Huaiquiñir Hugo Carrasco Marcela Collio Calfunao Rosendo Huisca Cristian Carrillan Anton Hector Painequeo Salvador Cañulef Flor Caniupil Claudio Millacura
Questions? For more information: http://www.cs.cmu.edu/~aria/avenue/