Josef van Genabith & Andy Way TransBooster (2003-2006)

Previous MT Work & GramLab • Josef van Genabith & Andy Way • TransBooster (2003-2006) • LaDEva: Labelled Dependency-Based MT Evaluation (2006-2008) • GramLab (2001-2008)

TransBooster • TransBooster (2003-2006) • Enterprise Ireland funded Basic Research Project • PI: Josef van Genabith Col: Andy Way • Students: Bart Mellebeek, Anna Khasin, Karolina Owczarzak

TransBooster • TransBooster Basic Idea: • MT systems are better on short (= simple) sentences than on longer ones. • Capitalise on this! • Divide up long sentences (automatically) into shorter components • Feed those components to MT system • Translate (get better results for shorter components) • Put (better) translations together in target (= get better translation) • A bit like Controlled Language, but automatic and without the restrictions (to particular syntax etc.)!

TransBooster TransBooster Example

TransBooster • Wrapper technology • Tricks MT system to produce better results …

TransBooster • TransBooster needs • Good parsers • Head and argument/adjunct finding rules • TransBooster with • Rule-Based MT (Systran, Logomedia) • Example-Based MT (DCU system) • Statistical MT (standard Aachen PBSMT) • Multi-engine MT • Improves results! => full details Bart Mellebeek’s PhD & publications

TransBooster Bart Mellebeeks PhD dissertation 2007

LaDEva • LaDEva: Labelled Dependency Based Evaluation for MT (2005-2008) • Microsoft Ireland funded Basic Research Project • PIs: Josef van Genabith/Andy Way • Students: Karolina Owczarzak

LaDEva • Basic Idea: • Automatic evaluation methods extremely important for MT • String-based MT evaluation (BLEU etc.) unfairly penalises perfectly valid • - lexical variation/paraphrases • - syntactic variation/paraphrases • Compare: • John resigned yesterday. • Yesterday, John quit. • Use labelled dependencies (instead of surface strings) for automatic evaluation

LaDEva LaDEva example (syntactic variation): Use WordNet and PBSMT alignments for lexical variation …

LaDEva • LaDEva needs • Very (!) robust dependency parsers that can parse MT output (as opposed to grammatical language) • DCU GramLab treebank-based LFG parsers • Microsoft Parsers • WordNet, PBSMT alignments • Evaluate LaDEva using • BLEU • NIST • GTM • Meteor • in terms of correlation with human judgments

LaDEva

LaDEva Karolina Owczarzak’s PhD thesis 2008

GramLab • GramLab (2001 – 2008) • - Automatic Annotation of Penn-II Treenbank with LFG F-Structures (2001-2004) Enterprise Ireland funded Basic Research Project • Team: PI: Josef van Genabith, Col: Andy Way, Aoife Cahill, Mairead McCarthy, Mick Burke, Ruth O’Donovan • - GramLab: Chinese, Japanese, Arabic, Spanish, French, German, English(2004-2008) Science Foundation Ireland funded Principal Investigatorship • Team: PI: Josef van Genabith, Grzegorz Chrupala, Natalie Schluter, Ines Rehbein, Yuqing Guo, Masanori Oya, Amine Akrout, Dr. Aoife Cahill, Dr. Yaffa Al-Raheb, Dr. Deirdre Hogan, Dr. Sisay Adafre, Dr. Lamia Tounsi, Dr. Mohammed Attia

GramLab • GramLab (2001 – 2008) • Basic Idea: • Handcrafting deep wide coverage grammars is time-consuming, expensive and difficult to scale to unrestricted text. • Acquire grammars automatically from treebanks => shallow grammars • New: acquire deep grammars automatically from treebanks

GramLab • Shallow Grammar: defines language as set of strings and associates syntactic structure to string • Deep Grammar: shallow grammar + maps strings to information (meaning, dependencies, predicate argument structure – “who did what to whom”) + non-local dependency resolution

GramLab

GramLab • Probabilistic Parsing & Probabilistic Generation • Used in MT Evaluation (Karo), Question Answering System (Sisay) • Outperforms best hand-crafted resources (XLE, RASP) for English • Lots of publications, including 2 Computational Linguistics Journal Papers, 6 ACL, COLING, EMNLP Papers (2004-2008) • Aoife Cahill, Michael Burke, Ruth O'Donovan, Stefan Riezler, Josef van Genabith and Andy Way, Wide-Coverage Deep Statistical Parsing using Automatic Dependency Structure Annotation in Computational Linguistics, 2008 • Ruth O'Donovan, Michael Burke, Aoife Cahill, Josef van Genabith and Andy Way (2005) Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks, Computational Linguistics, 2005 • Transfer-based probabilistic data-driven MT … (Yvette Graham) • LORG industry strength parsers and generators for IE/IR & QA (Jennifer & Deirdre)

Josef van Genabith & Andy Way TransBooster (2003-2006)