1 / 12

Bridging the Gap: Cutting Edge Technologies Working for Lesser-Resourced Languages

Bridging the Gap: Cutting Edge Technologies Working for Lesser-Resourced Languages. Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Kathrin Probst. MT Challenges. Interlingua.

jimbo
Download Presentation

Bridging the Gap: Cutting Edge Technologies Working for Lesser-Resourced Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging the Gap:Cutting Edge Technologies Working for Lesser-Resourced Languages Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Kathrin Probst

  2. MT Challenges Interlingua Semantic Analysis Sentence Planning Transfer Rules Text Generation Syntactic Parsing Source (e.g. Quechua) Target (e.g. English) Direct: SMT, EBMT

  3. MT Challenges Need Human expertise But high quality Interlingua Semantic Analysis Sentence Planning Transfer Rules Text Generation Syntactic Parsing Source (e.g. Quechua) Target (e.g. English) Direct: SMT, EBMT

  4. MT Challenges Need Human expertise But high quality Interlingua Semantic Analysis Sentence Planning Transfer Rules Text Generation Syntactic Parsing Source (e.g. Quechua) Target (e.g. English) Direct: SMT, EBMT Need large bilingual corpus But fast to develop

  5. AVENUE MT Approach Interlingua Semantic Analysis Sentence Planning Transfer Rules Text Generation Syntactic Parsing AVENUE: Automate Rule Learning Source (e.g. Quechua) Target (e.g. English) Direct: SMT, EBMT

  6. AVENUE MT Approach Interlingua Semantic Analysis Sentence Planning Transfer Rules Text Generation Syntactic Parsing AVENUE: Automate Rule Learning Source (e.g. Quechua) Target (e.g. English) Direct: SMT, EBMT Leverage Linguistic Structure Utilize Bilingual Lingual Speakers

  7. Mapudungun 900,000 Speakers Inupiaq 100’s of Speakers Marcello’s Languages? 100’s of Speakers Quechua 6 Million Speakers

  8. Three Sub-Problems • Morphology Induction • Initial Syntax Learning • Syntax Refinement

  9. Morphology Induction Paradigms Organize Morphology

  10. e.er.erá.ido.ieron.ió 28: deb, escog, ofrec, roconoc, vend, ... e.ido.ieron.ir.irá.ió 28: asist, dirig, exig, ocurr, sufr, ... azar.e.ido.ieron.ir.ió 1: sal e.er.erá.ieron.ió 32: deb, padec, romp, ... e.erá.ido.ieron.ió 28: deb, escog, ... e.er.ido.ieron.ió 46: deb, parec, recog... e.ido.ieron.irá.ió 28: asist, dirig, ... e.ido.ieron.ir.ió 39: asist, bat, sal, ... e.ido.ieron.ió 86: asist, deb, hund,... e.erá.ieron.ió 32: deb, padec, ... er.ido.ieron.ió 58: ascend, ejerc, recog, ... ido.ieron.ir.ió 44: interrump, sal, ... Paradigm Discovery in 3 Steps • Search for partial paradigms in a network of candidates. • Cluster overlapping partial paradigms • Filter the clusters, keeping the largest clusters most likely to model true paradigms A Portion of a Spanish paradigm candidate network

  11. Morpho Challenge 2007 Unsupervised Morphology Induction Competition • English • 3rd Place Overall • Bested the strong baseline Morfessor (Creutz, 2006) • German • 1st Place with Combined ParaMor-Morfessor System

  12. Syntax Induction

More Related