1 / 7

Fightin’ Words

This report explores the assessment of system characteristics and suitable metrics for evaluating language translation systems. It discusses the impact of real-world data, metric validity and automation, advantages/disadvantages of specific metrics, score variations and ranges, correlations between metrics, and overall quality indicators. It also suggests resources such as an MTE primer, book with papers and commentary, exercises, workshops, and online platforms for classification and data sharing.

Download Presentation

Fightin’ Words

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fightin’ Words Points for discussion Points for review Looking for a report structure?

  2. System Assessment • What metrics are suitable for assessing what system characteristics? • What system characteristics reflect what user needs? • Is there a radical difference between evaluation focusing on research or development needs and evaluation focusing on end-user needs? • When should real-world data be used? • What is the impact of using it?

  3. Metric Choice • What constitutes a valid metric • How can you demonstrate that a metric is valid? • What metrics can be automated • What are the advantages/disadvantages of specific metrics? • For the metric(s) selected, what are the difficulties in applying them?

  4. Metric Choice - II • For a given metric, what variations in scores are typically produced? • What are the statistical error variances? • For a given metric, what are the score ranges for ‘good’ and ‘bad’ systems • Are there metrics which correlate with one another? • Are there metrics which indicate an overall quality score? • Are there metrics which work better for specific language pairs?

  5. Where to from here? • MTE Primer? • Book with papers, and commentary fitting them into the right context (including the taxonomy?) • Exercises the reader can do • Showing how to fit new ideas into taxonomy • Contribution to the language community in general • MT Summit Workshop • Other workshops?

  6. BIG HUG

  7. Web Sites • Classification (taxonomy) • Open to everyone • Working area • Possibly password protected • Rite of initiation? • Data for people - at least until next workshop • URL’s • http://issco-www.unige.ch/projects/isle/mt-eval-whereis.html

More Related