210 likes | 373 Views
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics. Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz. Outline. Introduction State of the Art Discussion of MT Evaluation Metrics Hypothesis & Objective Methodology & Schedule.
E N D
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz
Outline • Introduction • State of the Art • Discussion of MT Evaluation Metrics • Hypothesis & Objective • Methodology & Schedule
Introduction • Quickly access to Multilingual Information • Need for quick translation • High increase of MT Systems • Need for evaluation of those MT Systems • Evaluation needs to be quick and reliable
Introduction • Current and mostusedEvaluationMetrics show problems • New approachestoEvaluationusinglinguisticinformation: • Syntacticinfo • Semanticinfo • Ourscenario: • Comparissonbetweenalreadyexistingsystems • Direction of translationto test: English-Spanish
State of the Art • MT absolutely linked to MT Evaluation • Purpose of the evaluation methods: • Error analysis • System comparisson • Chronologically: • Human MT Evaluation • Automatic MT Evaluation
State of the ArtTypes of MT Evaluation • Focused on Context: • Context-based Evaluation (FEMTI) • Evaluates suitability of the MT Technology & the MT System for the user’s purpose • Parameters of analysis: functionality, reliability, usabiility, efficiency, maintainability, portability, cost, etc. • Focused on Quantitiy & Quality: • Human Evaluation and Automatic Evaluation
State of the ArtTypes of MT Evaluation • HumanEvaluation: • Severalapproaches: • Fidelity (ALPAC report) • Intelligibility (ALPAC report) • Comprehensiveevaluation of informativeness (ARPA) • Quality panel evaluation • Adequacy and Fluency (Semantics and Syntax) • PreferredTranslation • Required Post-Editing
State of the ArtTypes of MT Evaluation • Human Evaluation: • Advantage: human evaluators can evaluate the overallqualitiy of the system • Disadvantages: • Time-consuming • Expensive • Subjective
State of the ArtTypes of MT Evaluation • Automatic Evaluation: • Approaches: • Based on Lexical Matching • Based on Syntax • Based on Semantics
State of the ArtTypes of MT Evaluation • Based on Lexical Matching: • Dominant approach to Automatic MT Evaluation • Seeks for lexical similarities between MT output and reference translations • Types: • Edit Distance Measures (WER) • Precision-oriented Measures (BLEU) • Recall-oriented Measures (ROUGE) • Measure balancing Precision & Recall (GTM)
State of the ArtTypes of MT Evaluation • Based on Syntax • Recently developed • Focused on the syntax of the output sentence • Types: • Constituency Parsing • Dependency Parsing • Combination of both analyses (Liu & Gildea 2005)
State of the ArtTypes of MT Evaluation • Based on Semantics: • Recently developed • Focused on the semantics of the output level • Types: • NEs: Quality over NEs (NEE) • Semantic Roles: Similarities over Semantic Roles (SR)
Discussion of MT evaluationMetrics • Human Evaluation: • Advantatges: • Allow to evaluate overall quality • Disadvantatges: • Time-consuming • Expensive • Subjective
Discussion of MT EvaluationMetrics • Automatic Evaluation: • Advantages: • Fast • Not expensive • Objective • Updatable • Disadvantages?
Discussion of MT EvaluationMetrics • AutomaticMetricsbasedon Lexical Matching: • Great advance in MT Research in thelastdecade • Widelyaccepted & usedbythe SMT researchcommunity • BLEU isthemostusedAutomaticMetric • Criticizedbythosenotdeveloping SMT systems • Usuallydependontranslationreferences • Onlytakeintoaccount lexical similarities & disregardsyntax • Biased
Discussion of MT EvaluationMetrics • AutomaticMetricsbasedonSyntax: • Goodimprovement • Works at sentencelevel • OnlyfocusedonSyntax • Whataboutmeaning? • AutomaticmetricsbasedonSemantics: • Goodimprovement • OnlyNEs & Semantic Roles • NEsnottoorelevant • Needfurtherdevelopment • Onlyfocusedonmeaning, whataboutsyntax?
Discussion of MT EvaluationMetrics • Discussion of Automatic Metrics: • Each metric focuses on a partial aspect of quality • Strongly biased evaluations • Unfair comparisson between systems • Overtuning of the system • Need for integration of metrics • Parametric vs. Non-parametric • Evaluation of the quality of a metric combination • Human likeness • Human acceptability
Hypothesis & Objective • Hypothesis: Adding new linguistic information will improve the performance of Automatic Metrics • Main Objective: Proposing a new Automatic Evaluation Metric based on linguistic information.
Hypothesis & Objective • SecondaryObjectives: • Explore linguisticinformation: • Syntacticinfo: POS, shallowparsing, chunking, full parsing, dependencyparsing, constituencyparsing, etc. • Semanticinfo: Semantic Roles, semanticfeatures, Wordnet, Framenet, Lexical Semantics, etc. • Look forlinguisticresourcesappropriatetobecomputationallyprocessed • Look forlinguisticresourcespubliclyavailable • Explore theappropriatewayto combine thisinformation
Methodology & Schedule • 4 stages: • Stage 1 (year 1 & 2): • Bibliography research and analysis: • Detailed exploration and analysis of Automatic Evaluation Metrics • Detailed exploration, analysis and selection of the adequate linguistic information. • Exploration of the feasibility and availability of the linguistic resources needed • Stage 2 (year 1 & 2): • Selection of the Corpus of evaluation
Methodology & Schedule • Stage 3 (year 3): • Experiments on how to combine this linguistic information and the automatic evaluation metrics • Evaluation of our metric combination based on either likeness or acceptability. • Stage 4 (year 4): • Analysis & discussion of the results obtained • Summary of the findings and reflection on the results obtained • Proposal of a new evaluation metric