1 / 18

VERTa : Linguistic Features in MT Evaluation

VERTa : Linguistic Features in MT Evaluation. Joint work with: Elisabet Comelles (UB) Jordi Atserias (FBM) Victoria Arranz (ELDA/ELRA) Irene Castellon (UB). Outline. Introduction Methodology VERTa Lexical Similarity Metric Morphological Similarity Metric Dependency Similarity Module

yale
Download Presentation

VERTa : Linguistic Features in MT Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VERTa: Linguistic Features in MT Evaluation Joint work with: Elisabet Comelles (UB) Jordi Atserias (FBM) Victoria Arranz (ELDA/ELRA) Irene Castellon (UB)

  2. Outline • Introduction • Methodology • VERTa • Lexical Similarity Metric • Morphological Similarity Metric • Dependency Similarity Module • N-gram Similarity Module • Metrics Combination • Experiments • Conclusions & Future Work

  3. Introduction • MT metrics: • BLEU • Linguistically-motivated Metrics • Lexical Information (Banerjee&Lavie 2005) • Syntactic Information (Liu &Hildea 2005; He et al. 2010) • Semantic Information (Giménez&Márquez 2007 & 2008a) • Combination of linguistic features: • Machine-learning approach (Leusch and Ney, 2009; Albrecht and Hwa, 2007) • Non-parametric approach (Giménez 2008b &Specia&Giménez 2010) • Our proposal: VERTa (work in progress) • Linguistically-motivated • Combination of linguistic features

  4. Methodology • Several linguistic phenomena need to be taken into account in MT evaluation. • Lexical Semantics: “I believe the situation” vs. “I think the situation” • Syntax: “...a delegation of Moroccan police...” vs. “...a Moroccan police delegation...” “...were assassinated by unknown men...” vs. “...unknown men assassinated...” • Word Order: “... Putin on Thursday announced that...” vs. “Putin announced on Thursday...” • Semantics + Morphology: “.... carrying out an attack in Moscow...” vs. “...Chechenscarry out an attack in Moscow”

  5. METHODOLOGY • Linguistic knowledge organised in different levels: • Lexical Information (Lexical Units) • Morphological Information (Lexical Units & POS) • Syntactic Information (Dependency relations) • Sentence Semantics (Semantic Arguments?) • Evaluation of both Adequacy & Fluency • Results for each module can help Error Analysis

  6. OurProposal: VERTa Weighted Combination Lexical Morphological Dependency Word matches W1 -> W1, W4 W2 -> W3, W22 W3 -> W3 N-gram ….

  7. VERTa: Lexical Similarity Module • Aim: identifying lexical similarities • Lexical matches • System of weights (weighted average)

  8. VERTa: MorphologicalSimilarity Module • Aim: Accuracy • Matches of pairs of features (lexical info + POS) • System of weights (weighted average)

  9. VERTa: DependencySimilarity Module • Aim: Capturing relations between constituents despite their position in the sentence HYP: After a meeting on Monday night with the head of Egyptian intelligence chief Omar SuleimanHaniya said.... REF: Haniya said, after a meeting on Monday evening with the head of Egyptian Intelligence General Omar Suleiman...

  10. VERTa: DependencySimilarity Module • Based on the lexical similarity module • Matches of triples: Label(Head,Mod) • System of weights (weighted average)

  11. VERTa: DependencySimilarity Module (underdevelopment) • Extra-rules at phrase and sentence level. • Examples: HYP: ...between the two ministries of interior... REF: ...between the two interior ministries... HYP_prep_of(ministries, interior) = REF_amod(ministries, interior) HYP: After meeting the Moroccan news agency published a joint statement... REF: A joint statement published (...) by the Moroccan news agency... HYP_nsubj(published, agency) = REF_agent(published, agency)

  12. N-gramSimilarity Module • Aim: identifying linear order of lexical elements • Based on the lexical similarity module word matches • Matching chunks (length= 2 – sentence-length) HYP: … the situation in the area… REF: … the situation in the region…

  13. MetricsCombination • Each metric receives a specific weight depending on: • The type of evaluation • The language evaluated • Set of weights for the experiments: • Adequacy + English: • Lexical Module: 0.444 • Morphology Module: 0.111 • N-gram Module: 0.111 • Dependency Module: 0.333

  14. Experiments • Preliminary tests: • To check the adequacy of the linguistic features used • To reconsider and improve the on-going development of the metric • Aimed at: • Influence of the dependency module • Influence of hyperonyms and hyponyms • Comparing VERTa with other metrics • Data (MetricsMaTr 2010 Shared-Task): • 8 different systems • 4 reference translations • 100 segments/system (28,000 words approx.) • Human judgments based on adequacy

  15. Experiments Influence of the Dependency module and Use of Hyperonyms and Hyponyms  Segment level • The dependency module improves the performance of the metric • The use of hyponyms and hyperonyms decreases the performance of the metric • HYP: …the situation in the area […] is on its danger mark day today… • REF:…the situation in the region […] been as dangerous as it is today…

  16. Experiments VERTa vs. other well-known metrics

  17. Conclusions&FutureWork • The more linguistic information used, the higher the scores are • Use of linguistic information is necessary in MT evaluation • VERTa shows promising results • Preliminary results are helpful to continue with our on-going research: • Reconsidering the linguistic features used + using other linguistic information (NEs, MWs, semantics) • Finishing the dependency module • Tuning of weights • Meta-evaluation: • Analyze each level separately • Evaluate in terms of fluency • Test VERTa with other languages

  18. MANY THANKS!

More Related