1 / 21

Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics. Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz. Outline. Introduction State of the Art Discussion of MT Evaluation Metrics Hypothesis & Objective Methodology & Schedule.

cato
Download Presentation

Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz

  2. Outline • Introduction • State of the Art • Discussion of MT Evaluation Metrics • Hypothesis & Objective • Methodology & Schedule

  3. Introduction • Quickly access to Multilingual Information • Need for quick translation • High increase of MT Systems • Need for evaluation of those MT Systems • Evaluation needs to be quick and reliable

  4. Introduction • Current and mostusedEvaluationMetrics show problems • New approachestoEvaluationusinglinguisticinformation: • Syntacticinfo • Semanticinfo • Ourscenario: • Comparissonbetweenalreadyexistingsystems • Direction of translationto test: English-Spanish

  5. State of the Art • MT absolutely linked to MT Evaluation • Purpose of the evaluation methods: • Error analysis • System comparisson • Chronologically: • Human MT Evaluation • Automatic MT Evaluation

  6. State of the ArtTypes of MT Evaluation • Focused on Context: • Context-based Evaluation (FEMTI) • Evaluates suitability of the MT Technology & the MT System for the user’s purpose • Parameters of analysis: functionality, reliability, usabiility, efficiency, maintainability, portability, cost, etc. • Focused on Quantitiy & Quality: • Human Evaluation and Automatic Evaluation

  7. State of the ArtTypes of MT Evaluation • HumanEvaluation: • Severalapproaches: • Fidelity (ALPAC report) • Intelligibility (ALPAC report) • Comprehensiveevaluation of informativeness (ARPA) • Quality panel evaluation • Adequacy and Fluency (Semantics and Syntax) • PreferredTranslation • Required Post-Editing

  8. State of the ArtTypes of MT Evaluation • Human Evaluation: • Advantage: human evaluators can evaluate the overallqualitiy of the system • Disadvantages: • Time-consuming • Expensive • Subjective

  9. State of the ArtTypes of MT Evaluation • Automatic Evaluation: • Approaches: • Based on Lexical Matching • Based on Syntax • Based on Semantics

  10. State of the ArtTypes of MT Evaluation • Based on Lexical Matching: • Dominant approach to Automatic MT Evaluation • Seeks for lexical similarities between MT output and reference translations • Types: • Edit Distance Measures (WER) • Precision-oriented Measures (BLEU) • Recall-oriented Measures (ROUGE) • Measure balancing Precision & Recall (GTM)

  11. State of the ArtTypes of MT Evaluation • Based on Syntax • Recently developed • Focused on the syntax of the output sentence • Types: • Constituency Parsing • Dependency Parsing • Combination of both analyses (Liu & Gildea 2005)

  12. State of the ArtTypes of MT Evaluation • Based on Semantics: • Recently developed • Focused on the semantics of the output level • Types: • NEs: Quality over NEs (NEE) • Semantic Roles: Similarities over Semantic Roles (SR)

  13. Discussion of MT evaluationMetrics • Human Evaluation: • Advantatges: • Allow to evaluate overall quality • Disadvantatges: • Time-consuming • Expensive • Subjective

  14. Discussion of MT EvaluationMetrics • Automatic Evaluation: • Advantages: • Fast • Not expensive • Objective • Updatable • Disadvantages?

  15. Discussion of MT EvaluationMetrics • AutomaticMetricsbasedon Lexical Matching: • Great advance in MT Research in thelastdecade • Widelyaccepted & usedbythe SMT researchcommunity • BLEU isthemostusedAutomaticMetric • Criticizedbythosenotdeveloping SMT systems • Usuallydependontranslationreferences • Onlytakeintoaccount lexical similarities & disregardsyntax • Biased

  16. Discussion of MT EvaluationMetrics • AutomaticMetricsbasedonSyntax: • Goodimprovement • Works at sentencelevel • OnlyfocusedonSyntax • Whataboutmeaning? • AutomaticmetricsbasedonSemantics: • Goodimprovement • OnlyNEs & Semantic Roles • NEsnottoorelevant • Needfurtherdevelopment • Onlyfocusedonmeaning, whataboutsyntax?

  17. Discussion of MT EvaluationMetrics • Discussion of Automatic Metrics: • Each metric focuses on a partial aspect of quality • Strongly biased evaluations • Unfair comparisson between systems • Overtuning of the system • Need for integration of metrics • Parametric vs. Non-parametric • Evaluation of the quality of a metric combination • Human likeness • Human acceptability

  18. Hypothesis & Objective • Hypothesis: Adding new linguistic information will improve the performance of Automatic Metrics • Main Objective: Proposing a new Automatic Evaluation Metric based on linguistic information.

  19. Hypothesis & Objective • SecondaryObjectives: • Explore linguisticinformation: • Syntacticinfo: POS, shallowparsing, chunking, full parsing, dependencyparsing, constituencyparsing, etc. • Semanticinfo: Semantic Roles, semanticfeatures, Wordnet, Framenet, Lexical Semantics, etc. • Look forlinguisticresourcesappropriatetobecomputationallyprocessed • Look forlinguisticresourcespubliclyavailable • Explore theappropriatewayto combine thisinformation

  20. Methodology & Schedule • 4 stages: • Stage 1 (year 1 & 2): • Bibliography research and analysis: • Detailed exploration and analysis of Automatic Evaluation Metrics • Detailed exploration, analysis and selection of the adequate linguistic information. • Exploration of the feasibility and availability of the linguistic resources needed • Stage 2 (year 1 & 2): • Selection of the Corpus of evaluation

  21. Methodology & Schedule • Stage 3 (year 3): • Experiments on how to combine this linguistic information and the automatic evaluation metrics • Evaluation of our metric combination based on either likeness or acceptability. • Stage 4 (year 4): • Analysis & discussion of the results obtained • Summary of the findings and reflection on the results obtained • Proposal of a new evaluation metric

More Related