1 / 28

Evaluating the Waspbench

Evaluating the Waspbench. A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University of Brighton Credits: UK EPSRC grant WASPS, M34971. Lexicographers need NLP. NLP needs lexicography. Word senses: nowhere truer.

jason
Download Presentation

Evaluating the Waspbench

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University of Brighton Credits: UK EPSRC grant WASPS, M34971

  2. Lexicographers need NLP

  3. NLP needs lexicography

  4. Word senses: nowhere truer • Lexicography • the second hardest part

  5. Word senses: nowhere truer • Lexicography • the second hardest part • NLP • Word sense disambiguation (WSD) • SENSEVAL-1 (1998): 77% Hector • SENSEVAL-2 (2001): 64% WordNet

  6. Word senses: nowhere truer • Lexicography • the second hardest part • NLP • Word sense disambiguation (WSD) • SENSEVAL-1 (1998): 77% Hector • SENSEVAL-2 (2001): 64% WordNet • Machine Translation • Main cost is lexicography

  7. Synergy The WASPBENCH

  8. Inputs and outputs • Inputs • Corpus (processed) • Lexicographic expertise

  9. Inputs and outputs • Outputs • Analysis of meaning/translation repertoire • Implemented: • Word expert • Can disambiguate A “disambiguating dictionary”

  10. Inputs and outputs MT needs rules of form in context C, S => T • Major determinant of MT quality • Manual production: expensive • Eng oil => Fr huile or petrole? • SYSTRAN: 400 rules

  11. Inputs and outputs MT needs rules of form in context C, S => T • Major determinant of MT quality • Manual production: expensive • Eng oil => Fr huile or petrole? • SYSTRAN: 400 rules Waspbench output: thousands of rules

  12. Evaluation hard

  13. Evaluation hard • Three communities

  14. Evaluation hard • Three communities • No precedents

  15. Evaluation hard • Three communities • No precedents • The art and craft of lexicography

  16. Evaluation hard • Three communities • No precedents • The art and craft of lexicography • MT personpower budgets

  17. Five threads • as WSD: SENSEVAL • for lexicography: MED • expert reports • Quantitative experiments with human subjects • India • Within-group consistency • Leeds • Comparison with commercial MT

  18. Method • Human1 creates word experts • Computer uses word experts to disambiguate test instances • MT system translates same test instances • Human2 • evaluates computer and MT performance on each instance: • good / bad / unsure / preferred / alternative

  19. Words • mid-frequency • 1,500-20,000 instances in BNC • At least two clearly distinct meanings • Checked with ref to translations into Fr/Ger/Dutch • 33 words • 16 nouns, 10 verbs, 7 adjs • around 40 test instances per word

  20. Words

  21. Human subjects • Translation studies students, Univ Leeds • Thanks: Tony Hartley • Native/near-native in English and their other language • twelve people, working with: • Chinese (4) French (3) German (2) Italian (1) Japanese (2) (no MT system for Japanese) • circa four days’ work: • introduction/training • two days to create word experts • two days to evaluate output

  22. Method • Human1 creates word experts, average 30 mins/word • Computer uses word experts to disambiguate test instances • MT system: Babelfish via Altavista translates same test instances • Human2 • evaluates computer and MT performance on each instance: • good / bad / unsure / preferred / alternative

  23. Results (%)

  24. Results by POS (%)

  25. Observations • Grad student users, 4-hour training • 30 mins per (not-too-complex) word • ‘fuzzy’ words intrinsically harder • No great inter-subject disparities • (it’s the words that vary, not the people)

  26. Conclusion • WSD can improve MT (using a tool like WASPS)

  27. Future work • multiwords • n>2 • thesaurus • other source languages • new corpora, bigger corpora • the web

More Related