270 likes | 409 Views
Team : Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva, Jia Cui, Klaus Macherey, David Guthrie, Martin Holub, Marco Cammisa, Cassia Martin, Jerry Liu, Kris Haralambiev Fred Jelinek. Semantic Annotation – Week 3. Our Hypotheses.
E N D
Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva, Jia Cui, Klaus Macherey, David Guthrie, Martin Holub, Marco Cammisa, Cassia Martin, Jerry Liu, Kris Haralambiev Fred Jelinek Semantic Annotation – Week 3
Our Hypotheses • A transformation of a corpus to replace words and phrases with coarse semantic categories will help overcome the data sparseness problem encountered in language modeling • Semantic category information will also help improve machine translation • A noun-centric approach initially will allow bootstrapping for other syntactic categories
An Example • Astronauts aboard the space shuttle Endeavor were forced to dodge a derelict Air Force satellite Friday • Humansaboardspace_vehicledodgesatellitetimeref.
Our Progress – Preparing the data- Pre-Workshop • Identify a tag set • Create a Human annotated corpus • Create a double annotated corpus • Process all data for named entity and noun phrase recognition using GATE Tools • Develop algorithms for mapping target categories to Wordnet synsets to support the tag set assessment
The Semantic Classes for Annotators • A subset of classes available in Longman's Dictionary of contemporary English (LDOCE) Electronic version • Rationale: • The number of semantic classes was small • The classes are somewhat reliable since they were used by a team of lexicographers to code • Noun senses • Adjective preferences • Verb preferences
Abstract T Concrete C Animate Q Inanimate I PhysQuant 4 Organic 5 Plant P Animal A Human H Liquid L Gas G Solid S Non-movable J Movable N B D F M - - Semantic Classes • Target Classes • Annotated Evidence
More Categories • U: Collective • K: Male • R: Female • W: Not animate • X: Not concrete or animal • Z: Unmarked We allowed annotators to choose “none of the above” (? in the slides that follow)
Our Progress – Data Preparation • Assess annotation format and define uniform descriptions for irregular phenomena and normalize them • Determine the distribution of the tag set in the training corpus • Analyze inter-annotator agreement • Determine a reliable set of tags – T • Parse all training data
Doubly Annotated Data • Instances (headwords): 10960 • 8,950 instances without question marks. • 8,446 of those are marked the same. • Inter-annotator agreement is 94% (83% including question marks) • Recall – these are non named entity noun phrases
2 Inter-annotator agreement – for each category
A few statistics on the human annotated data • Total annotated 262,230 instances • 48,175 with ? • 214,055 with a category • of those Z .5% • W and X .5% • 4 , 5 1.6%
Our progress – baselines • Determine baselines for automatic tagging of noun phrases • Baselines for tagging observed words in new contexts (new instances of known words) • Baselines for tagging unobserved words • Unseen words – not in the training material but in dictionary • Novel words – not in the training material nor in the dictionary/Wordnet
Overlap of dictionary and head nouns (in the BNC) • 85% of NP’s covered • only 33% of vocabulary (both in LDOCE and in Wordnet) in the NP’s covered
Preparation of the test environment • Selected the blind portion of the human annotated data for late evaluation • Divided the remaining corpus into training and held-out portions • Random division of files • Unambiguouswords for training – ambiguous for testing
Baselines using only (target) words and preceeding adjectives
Baselines using multiple knowledge sources • Experiments in Sheffield • Unambiguous tagger (assign only available semantic categories) • bag-of-words tagger (IR inspired) • window size 50 words • nouns and verbs • Frequency-based tagger (assign the most frequent semantic category)
Baselines using multiple knowledge sources (cont’d) • Frequency-based tagger • 16-18% error rate • bag-of-words tagger • 17% error rate • Combined architecture • 14.5-15% error rate
Bootstrapping to Unseen Words • Problem: Automatically identify the semantic class of words in LDOCE whose behavior was not observed in the training data • Basic Idea: We use the unambiguous words (unambiguous with respect to the our semantic tag set) to learn context for tagging unseen words.
Bootstrapping: statistics 6,656 different unambiguous lemmas in the (visible) human tagged corpus ...these contribute to 166,249 instances of data ...134,777 instances were considered correct by the annotators ! Observation: Unambiguous words can be used in the corpus in an “unforeseen” way
Bootstrapping baselines • Test Instances (instances of ambiguous words) : 62,853
Metrics for Intrinsic Evaluation • Need to take into account the hierarchical structure of the target semantic categories • Two fuzzy measures based on: • dominance between categories • edge distance in the category tree/graph • Results wrt inter annotator agreement is almost identical to exact match
What’s next • Investigate respective contribution of (independent) features • Incorporate syntactic information • Refine some coarse categories • Using subject codes • Using genus terms • Re-mapping via Wordnet
What’s next (cont’d) • Reduce the number of features/values via external resources: • lexical vs. semantic models of the context • use selectional preferences • Concentrate on complex cases (e.g. unseen words) • Preparation of test data for extrinsic evaluation (MT)