1 / 20

AVENUE Automatic Machine Translation for low-density languages

AVENUE Automatic Machine Translation for low-density languages. Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University. 2 HCI project proposals. Interface to Online Bilingual and Multilingual Dictionaries

guy
Download Presentation

AVENUE Automatic Machine Translation for low-density languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AVENUEAutomatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University

  2. 2 HCI project proposals • Interface to Online Bilingual and Multilingual Dictionaries • Translation Correction Tool interface: design, implementation and user studies

  3. Online Bilingual and Multilingual Dictionaries • bilingual and multilingual dictionaries for indigenous languages (Mapudungun [Chile], Inupiaq [Alaska], Aymara, Quechua and Aguaruna [Peru]) • For each bilingual/multilingual dictionary, we (will) have an excel database created by the local teams (Mapudungun: from a spoken corpus transcribed and translated into Spanish)

  4. Online Bilingual and Multilingual Dictionaries (cont.) For each entry, we give the translation in Spanish, some other linguistic information (POS), and a link to the actual sentence where it appears in the corpus. For example: Püñpüñkünuukey: se manifiesta en forma de ronchas nmlch-nmfhp1_x_0031_nmfhp_00 Mapu:Fey itrofillpüle kuerpu, ta pichike püñpüñkünuukey ta kalül may, peñi.Sp:Así es en todas partes del cuerpo, pequeñas ronchas se forman en el cuerpo pues, hermano

  5. Online Bilingual and Multilingual Dictionaries (cont.) • Currently, users can search for: • Mapudungun words • Spanish words • all the words starting with a letter • all the words containing a word or a string of characters

  6. Online Bilingual and Multilingual Dictionaries (cont.) • Primary users: • people in the indigenous communities • researchers in these countries, inside and outside the indigenous communities Chilean case: • product of the Ministry of Education. • students and teachers, mostly Mapuche, but maybe some Spanish users as well

  7. Online Bilingual and Multilingual Dictionaries (cont.) • Secondary users • Linguistic, Lexicography and Anthropology researchers from all over the world • random people browsing the www

  8. Online Dictionaries: Tasks for HCII project • analyze design of the basic web interface given a query for a word in either language, it presents the information for that entry to the user in the other language. • how to incorporate an audio file with the word as it was pronounced in the spoken corpus. • how to make it interactive, i.e. have bilingual users comment on the entries and possibly add new entries (need profile info)

  9. Translation Correction Tool (TCTool) • AVENUE is a project which developsAutomatic Machine Translation Systems for low-density languages • Since translations are automatic, i.e. not perfect, we need to refine them. • instead of having a professional translator, we want to find an automatic way to refine the output of the MTS -> TCTool

  10. TCTool • We can use the TCTool to automatically learn a refinement of the Transfer rules in our MTS, from users input • Challenges: • users most likely not familiar with computers -> user-friendly and Intuitive interface • bilingual informants can’t be assumed to have any linguistic knowledge

  11. Automatic Machine Translation Interlingua interpretation Transfer rules Corpus-based methods generation analysis

  12. Automatic Learning of a Transfer-based MTS tentative Transfer rules SVS algorithm Elicitation corpus Transfer module Rule Refinement module (tentative) TL sentences SL sentences

  13. Interactive and Automatic rule refinement Interactive step (TCTool): Given an MTS, translate sentences and present them to the users for minimal correction (interface design, MT error classification) Automatic step: Machine learning DS and algorithms to map user input with refined transfer-rules

  14. TCTool: Tasks for HCII project • analyze design of the basic web interface given a translated sentence, it asks the user to minimally correct it, if incorrect, and to classify the error(s). • how to explain what minimally correction is • what is the right error classification for non-expert and non-linguist users • Can naïve users reliably pinpoint the source of errors? • design User Studies to show reliability of user input (Spanish – English, English – Spanish, English – Chinese)

  15. AVENUE project members LTI team: ResearchersPh. D. students Jaime Carbonell Ariadna Font Llitjós Lori Levin Christian Monson Alon LavieErik Peterson Ralf Brown Katharina Probst Avenue External Project Coordinator Rodolfo M Vega, Chilean team: Eliseo Cañulef Luis Caniupil Huaiquiñir Hugo Carrasco Marcela Collio Calfunao Rosendo Huisca Cristian Carrillan Anton Hector Painequeo Salvador Cañulef Flor Caniupil Claudio Millacura

  16. Questions? For more information: http://www.cs.cmu.edu/~aria/avenue/

More Related