200 likes | 368 Views
Number or Nuance: Factors Affecting Reliable Word Sense Annotation. Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder. Annotators in their little nests agree; And ‘tis a shameful sight, When taggers on one project Fall out, and chide, and fight.
E N D
Number or Nuance: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder
Annotators in their little nests agree; And ‘tis a shameful sight, When taggers on one project Fall out, and chide, and fight. —[adapted from] Isaac Watts
Automatic word sense disambiguation • Lexical ambiguity is a significant problem in natural language processing (NLP) applications (Agirre & Edmonds, 2006) • Text summarization • Question answering • WSD systems might help • Several studies show benefits for NLP tasks (Sanderson, 2000; Stokoe, 2003; Carpuat and Wu, 2007; Chan, Ng and Chiang, 2007) • But only with higher system accuracy (90%+)
Possible factors affecting the reliability of word sense annotation • Fine-grained senses result in many senses per word, creating a heavy cognitive load on annotators, making accurate and consistent tagging difficult • Fine-grained senses are not distinct enough to reliably discriminate between
Requirements to compare fine-grained and coarse-grained annotation • Annotation of the same words on the same corpus instances • Sense inventories differing only in sense granularity • Previous work (Ng et al., 1999; Edmonds & Cotton, 2001; Navigli et al. 2007)
3 experiments • 40 verbs • Number of senses : 2-26 • Sense granularity: WordNet vs. OntoNotes • Exp. 1: confirm difference in reliability between fine- and coarse-grained annotation; vary granularity and number of senses • Exp. 2: hold granularity constant; vary number of senses • Exp. 3: hold number constant; vary granularity
Experiment 1 • Compare fine-grained sense inventory to coarse • 70 instances for each verb from the ON corpus • Annotated with WN senses by multiple pairs of annotators • Annotated with ON senses by multiple pairs of annotators • Compare the ON ITAs to the WN ITAs
Results • Coarse-grained ON annotations had higher ITAs than fine-grained WN annotations • Number of senses • No significant effect (t(79) = -1.28, p = .206). • Sense nuance • Yes, a significant effect (t(79) = 10.39, p < .0001). • With number of senses held constant, coarse-grained annotation is 16.2 percentage points higher than fine-grained.
Experiment 2: Number of senses • Hold sense granularity constant; vary # of senses • 2 pairs of annotators, using fine-grained WN senses • First pair uses full set of WN senses for a word • Second pair uses a restricted set on instances that we know should fit one of those senses
OntoNotes grouped sense A OntoNotes grouped sense B WN 1 2 4 5 6 11 12 WN 3 7 8 13 14 OntoNotes grouped sense C WN 9 10
"Then I just bought plywood, drew the pieces on it and cut them out." Full set of WN senses Restricted set of WN senses • 1. ---------------- • 2. ---------------- • 3. ---------------- • 4. ---------------- • 5. ---------------- • 6. ---------------- • 7. ---------------- • 8. ---------------- • 9. ---------------- • 10. ---------------- • 11. ---------------- • 12. ---------------- • 13. ---------------- • 14. ---------------- • 3. ---------------- • 7. ---------------- • 8. ---------------- • 13. ---------------- • 14. ----------------
Experiment 3 • Number of senses controlled; vary sense granularity • Compare the ITAs for the ON tagging with the restricted-set WN tagging
Conclusion • Number of senses annotators must choose between: never a significant factor • Granularity of the senses: a significant factor, with fine-grained senses leading to lower ITAs • Poor reliability of fine-grained word sense annotation cannot be improved by reducing the cognitive load on annotators. • Annotators cannot reliably discriminate between nuanced sense distinctions.
Acknowledgements We gratefully acknowledge the efforts of all of the annotators and the support of the National Science Foundation Grants NSF-0415923, Word Sense Disambiguation and CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation and CISE-CRI 0709167, as well as a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, a subcontract from BBN, Inc.
Restricted set annotation • Use the adjudicated ON data to determine the ON sense for each instance. • Use instances from experiment1 that were labeled with one selected ON sense (35 instances). • Each restricted-set annotator saw only the WN senses that were clustered to form the appropriate ON sense. • Compare to the full set annotation for those instances.