280 likes | 383 Views
Evaluating the Waspbench. A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University of Brighton Credits: UK EPSRC grant WASPS, M34971. Lexicographers need NLP. NLP needs lexicography. Word senses: nowhere truer.
E N D
Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University of Brighton Credits: UK EPSRC grant WASPS, M34971
Word senses: nowhere truer • Lexicography • the second hardest part
Word senses: nowhere truer • Lexicography • the second hardest part • NLP • Word sense disambiguation (WSD) • SENSEVAL-1 (1998): 77% Hector • SENSEVAL-2 (2001): 64% WordNet
Word senses: nowhere truer • Lexicography • the second hardest part • NLP • Word sense disambiguation (WSD) • SENSEVAL-1 (1998): 77% Hector • SENSEVAL-2 (2001): 64% WordNet • Machine Translation • Main cost is lexicography
Synergy The WASPBENCH
Inputs and outputs • Inputs • Corpus (processed) • Lexicographic expertise
Inputs and outputs • Outputs • Analysis of meaning/translation repertoire • Implemented: • Word expert • Can disambiguate A “disambiguating dictionary”
Inputs and outputs MT needs rules of form in context C, S => T • Major determinant of MT quality • Manual production: expensive • Eng oil => Fr huile or petrole? • SYSTRAN: 400 rules
Inputs and outputs MT needs rules of form in context C, S => T • Major determinant of MT quality • Manual production: expensive • Eng oil => Fr huile or petrole? • SYSTRAN: 400 rules Waspbench output: thousands of rules
Evaluation hard
Evaluation hard • Three communities
Evaluation hard • Three communities • No precedents
Evaluation hard • Three communities • No precedents • The art and craft of lexicography
Evaluation hard • Three communities • No precedents • The art and craft of lexicography • MT personpower budgets
Five threads • as WSD: SENSEVAL • for lexicography: MED • expert reports • Quantitative experiments with human subjects • India • Within-group consistency • Leeds • Comparison with commercial MT
Method • Human1 creates word experts • Computer uses word experts to disambiguate test instances • MT system translates same test instances • Human2 • evaluates computer and MT performance on each instance: • good / bad / unsure / preferred / alternative
Words • mid-frequency • 1,500-20,000 instances in BNC • At least two clearly distinct meanings • Checked with ref to translations into Fr/Ger/Dutch • 33 words • 16 nouns, 10 verbs, 7 adjs • around 40 test instances per word
Human subjects • Translation studies students, Univ Leeds • Thanks: Tony Hartley • Native/near-native in English and their other language • twelve people, working with: • Chinese (4) French (3) German (2) Italian (1) Japanese (2) (no MT system for Japanese) • circa four days’ work: • introduction/training • two days to create word experts • two days to evaluate output
Method • Human1 creates word experts, average 30 mins/word • Computer uses word experts to disambiguate test instances • MT system: Babelfish via Altavista translates same test instances • Human2 • evaluates computer and MT performance on each instance: • good / bad / unsure / preferred / alternative
Observations • Grad student users, 4-hour training • 30 mins per (not-too-complex) word • ‘fuzzy’ words intrinsically harder • No great inter-subject disparities • (it’s the words that vary, not the people)
Conclusion • WSD can improve MT (using a tool like WASPS)
Future work • multiwords • n>2 • thesaurus • other source languages • new corpora, bigger corpora • the web