1 / 27

Semantic Annotation – Week 3

Team : Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva, Jia Cui, Klaus Macherey, David Guthrie, Martin Holub, Marco Cammisa, Cassia Martin, Jerry Liu, Kris Haralambiev Fred Jelinek. Semantic Annotation – Week 3. Our Hypotheses.

Download Presentation

Semantic Annotation – Week 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva, Jia Cui, Klaus Macherey, David Guthrie, Martin Holub, Marco Cammisa, Cassia Martin, Jerry Liu, Kris Haralambiev Fred Jelinek Semantic Annotation – Week 3

  2. Our Hypotheses • A transformation of a corpus to replace words and phrases with coarse semantic categories will help overcome the data sparseness problem encountered in language modeling • Semantic category information will also help improve machine translation • A noun-centric approach initially will allow bootstrapping for other syntactic categories

  3. An Example • Astronauts aboard the space shuttle Endeavor were forced to dodge a derelict Air Force satellite Friday • Humansaboardspace_vehicledodgesatellitetimeref.

  4. Our Progress – Preparing the data- Pre-Workshop • Identify a tag set • Create a Human annotated corpus • Create a double annotated corpus • Process all data for named entity and noun phrase recognition using GATE Tools • Develop algorithms for mapping target categories to Wordnet synsets to support the tag set assessment

  5. The Semantic Classes for Annotators • A subset of classes available in Longman's Dictionary of contemporary English (LDOCE) Electronic version • Rationale: • The number of semantic classes was small • The classes are somewhat reliable since they were used by a team of lexicographers to code • Noun senses • Adjective preferences • Verb preferences

  6. Abstract T Concrete C Animate Q Inanimate I PhysQuant 4 Organic 5 Plant P Animal A Human H Liquid L Gas G Solid S Non-movable J Movable N B D F M - - Semantic Classes • Target Classes • Annotated Evidence

  7. More Categories • U: Collective • K: Male • R: Female • W: Not animate • X: Not concrete or animal • Z: Unmarked We allowed annotators to choose “none of the above” (? in the slides that follow)

  8. Our Progress – Data Preparation • Assess annotation format and define uniform descriptions for irregular phenomena and normalize them • Determine the distribution of the tag set in the training corpus • Analyze inter-annotator agreement • Determine a reliable set of tags – T • Parse all training data

  9. Doubly Annotated Data • Instances (headwords): 10960 • 8,950 instances without question marks. • 8,446 of those are marked the same. • Inter-annotator agreement is 94% (83% including question marks) • Recall – these are non named entity noun phrases

  10. Distribution of Double Annotated Data

  11. Agreement of doubly marked instances

  12. 2 Inter-annotator agreement – for each category

  13. Category distribution among agreed part 69%

  14. A few statistics on the human annotated data • Total annotated 262,230 instances • 48,175 with ? • 214,055 with a category • of those Z .5% • W and X .5% • 4 , 5 1.6%

  15. Our progress – baselines • Determine baselines for automatic tagging of noun phrases • Baselines for tagging observed words in new contexts (new instances of known words) • Baselines for tagging unobserved words • Unseen words – not in the training material but in dictionary • Novel words – not in the training material nor in the dictionary/Wordnet

  16. Overlap of dictionary and head nouns (in the BNC) • 85% of NP’s covered • only 33% of vocabulary (both in LDOCE and in Wordnet) in the NP’s covered

  17. Preparation of the test environment • Selected the blind portion of the human annotated data for late evaluation • Divided the remaining corpus into training and held-out portions • Random division of files • Unambiguouswords for training – ambiguous for testing

  18. Baselines using only (target) words

  19. Baselines using only (target) words and preceeding adjectives

  20. Baselines using multiple knowledge sources • Experiments in Sheffield • Unambiguous tagger (assign only available semantic categories) • bag-of-words tagger (IR inspired) • window size 50 words • nouns and verbs • Frequency-based tagger (assign the most frequent semantic category)

  21. Baselines using multiple knowledge sources (cont’d) • Frequency-based tagger • 16-18% error rate • bag-of-words tagger • 17% error rate • Combined architecture • 14.5-15% error rate

  22. Bootstrapping to Unseen Words • Problem: Automatically identify the semantic class of words in LDOCE whose behavior was not observed in the training data • Basic Idea: We use the unambiguous words (unambiguous with respect to the our semantic tag set) to learn context for tagging unseen words.

  23. Bootstrapping: statistics 6,656 different unambiguous lemmas in the (visible) human tagged corpus ...these contribute to 166,249 instances of data ...134,777 instances were considered correct by the annotators ! Observation: Unambiguous words can be used in the corpus in an “unforeseen” way

  24. Bootstrapping baselines • Test Instances (instances of ambiguous words) : 62,853

  25. Metrics for Intrinsic Evaluation • Need to take into account the hierarchical structure of the target semantic categories • Two fuzzy measures based on: • dominance between categories • edge distance in the category tree/graph • Results wrt inter annotator agreement is almost identical to exact match

  26. What’s next • Investigate respective contribution of (independent) features • Incorporate syntactic information • Refine some coarse categories • Using subject codes • Using genus terms • Re-mapping via Wordnet

  27. What’s next (cont’d) • Reduce the number of features/values via external resources: • lexical vs. semantic models of the context • use selectional preferences • Concentrate on complex cases (e.g. unseen words) • Preparation of test data for extrinsic evaluation (MT)

More Related