380 likes | 538 Views
A modality lexicon and its use in automatic tagging. Kathryn Baker, Michael Bloodgood , Bonnie Dorr, Nathanial W. Filardo , Lori Levin , Christine Piatko May 20, 2010 Presented by Lori Levin Language Technologies Institute Carnegie Mellon University. Context. SCALE 2009
E N D
A modality lexicon and its use in automatic tagging Kathryn Baker, Michael Bloodgood, Bonnie Dorr, Nathanial W. Filardo, Lori Levin, Christine Piatko May 20, 2010 Presented by Lori Levin Language Technologies Institute Carnegie Mellon University
Context • SCALE 2009 • Summer Camp in Applied Language Engineering • Johns Hopkins University Human Language Technology Center of Excellence • SIMT • Semantically informed MT • Can we improve statistical MT with semantic knowledge? • experiments with modality and named entities
Modality Tagger Output Trigger: lexical item that carries a modal meaning. Target: head of the proposition that it scopes over Holder: the experiencer or cognizer of the modality. Example 1: • Input: Americans should know that we can not hand over Dr. Khan to them. • Output: Americans <TrigRequireshould> <TargRequireknow> that we <TrigAblecan><TrigNegation not> <TargNOTAble hand> over Dr. Khan to them Example 2: • Input: He managed to hold general elections in the year 2002, but he can not be ignorant of the fact that the world at large did not accept these elections • Output: He <TrigSucceed managed> to <TargSucceedhold> general elections in the year 2002, but he <TrigAble can><TrigNegation not><TargNOTAblebe> ignorant of the fact that the world at large did <TrigNegation not> <TrigBelief accept> these <TargBelief elections>
Outline • A modality annotation scheme • A modality lexicon • A string based modality tagger • A tree based modality tagger • Evaluation of the taggers • Semantically informed MT
Core Cases of Modality (van derAuwera and Amman, World Atlas of Language Structures)
Related Concepts: Factivity • Did the proposition happen or not? • John went to New York. • John may go to New York. • If John goes to New York, he will visit MOMA. • John bought a ticket to go to NY. • FactBank: Saurí and Pustejovsky
Related Concepts: Evidentiality • Source of information • First hand experience or hearsay • They say that John went to NY. • Sensory information • I heard that John went to NY. • Conclusion from evidence • I don’t see John, so he must have gone to NY.
Other Related Concepts • Speaker attitude and sentiment • Conditionality • Hypotheticality • Realis and Irrealis mood • Tense, aspect, etc.
Modality Example 1: • Input: Americans should know that we can not hand over Dr. Khan to them. • Output: Americans <TrigRequireshould> <TargRequireknow> that we <TrigAblecan><TrigNegation not> <TargNOTAble hand> over Dr. Khan to them Example 2: • Input: He managed to hold general elections in the year 2002, but he can not be ignorant of the fact that the world at large did not accept these elections • Output: He <TrigSucceed managed> to <TargSucceedhold> general elections in the year 2002, but he <TrigAble can><TrigNegation not><TargNOTAblebe> ignorant of the fact that the world at large did <TrigNegation not> <TrigBelief accept> these <TargBelief elections>
Modality Annotation and Tagging • Annotation: Humans add labels to text, following instructions from a coding manual that defines an annotation scheme. • Tagging: A program automatically assigns labels • Goals: • Design an annotation scheme that can be followed with high intercoder agreement and low annotation time and cost • Train a tagger on human annotated data • Build a tagger based on the annotation scheme
The inventory of modalities in the annotation scheme • Belief: with what strength does H believe P? • Requirement: does H require P? • Permissive: does H allow P? • Intention: does H intend P? • Effort: does H try to do P? • Ability: can H do P? • Success: does H succeed in P? • Want: does H want P? Joint work with Sergei Nirenburg, Marge McShane, Teruko Mitamura, Owen Rambow, Mona Diab, Eduard Hovy, Bonnie Dorr, Christine Piatko, Michael Bloodgood H = Holder (experiencer or cognizer) P = Proposition
The Annotation Scheme • Identify a modality target P and then choose one of these modalities (choose the first one that applies) • H requires [P to be true/false] • H permits [P to be true/false] • H succeeds in [making P true/false] • H does not succeed in [making P true/false] • H is trying [to make P true/false] • H is not trying [to make P true/false] • H intends [to make P true/false] • H does not intend [to make P true/false] • H is able [to make P true/false] • H is not able [to make P true/false] • H wants [P to be true/false] • H firmly believes [P is true/false] • H believes [P may be true/false]
Six Simplifications • Transparency to negation • Duality of require and permit • Ordering for entailment • Annotators were not asked to nest modalities. • Default is Firmly Believe • Annotators were not asked to mark the holder.
SimplificationsTransparency to negation • Some modalities have negatives in the annotation scheme: not intend, not try, not be able, not succeed • Believe and want do not have negatives in our annotation scheme because of the similarity of • I don’t want him to go/I want him not to go. • Both are coded as H wants P to be false • I don’t believe he will go/I believe he will not go. • Both are coded as H believes P to be false.
SimplificationsDuality of require and permit • Require and permit do not have negations in the annotation scheme because • Not require P to be true means Permit P to be false • Not permit P to be true means Require P to be false
SimplificationsOrdering for entailment • John managed to go to NY. • What modality is this? Success? Intent? Effort? Desire? Ability? • Two entailment groupings ordered with respect to each other: • {requires permits} • {succeeds tries intends is able wants} • Both apply before “believe”, which is not in an entailment relation with either grouping. • The annotators are instructed to choose the first modality in the list that applies.
SimplificationsNo embedding of modalities • He might be able to swim • Only ability is tagged • Modals are never considered as targets of other modals in the annotation process
Six Simplifications • Transparency to negation • Duality of require and permit • Ordering for entailment • Annotators were not asked to nest modalities. • Default is Firmly Believe • Annotators were not asked to mark the holder.
Six Simplifications • Transparency to negation • Duality of require and permit • Ordering for entailment • Annotators were not asked to nest modalities. • Default is Firmly Believe • Annotators were not asked to mark the holder.
English Modality Lexicon • Modality trigger words • might, should, require, permit, need, try, possible, fail, etc. • About 150 lemmas • plus five forms for each verb where applicable • bare infinitive, present tense –s, past tense, past participle, present participle
English Modality Lexicon Example • need • Pos: VB • Modality: Require • Trigger word: Need • Subcategorization codes • V3-passive-basic • Large helicopters are needed to dispatch urgent relief materials. • V3-I3-basic • The government will need to work continuously for at least a year. • We will need them to work continuously. • T1-monotransitive-for-V3-verbs • We need a Sir Sayyed again to maintain this sentiment. • T1-passive-for-V3-verb • He is needed to work continuously. • modal-auxiliary-basic • He need not go.
Modality Example 1: • Input: Americans should know that we can not hand over Dr. Khan to them. • Output: Americans <TrigRequireshould> <TargRequireknow> that we <TrigAblecan><TrigNegation not> <TargNOTAble hand> over Dr. Khan to them Example 2: • Input: He managed to hold general elections in the year 2002, but he can not be ignorant of the fact that the world at large did not accept these elections • Output: He <TrigSucceed managed> to <TargSucceedhold> general elections in the year 2002, but he <TrigAble can><TrigNegation not><TargNOTAblebe> ignorant of the fact that the world at large did <TrigNegation not> <TrigBelief accept> these <TargBelief elections>
String Based English Modality Tagger • Input • Text that has been tagged with parts of speech. • Mark Triggers • Mark spans of words that are exact matches to entries in the modality lexicon and that have the same part of speech. • Mark Targets • Next non-auxiliary verb to the right of a trigger • Spans of words can be marked multiple times with different triggers and targets.
S Template VP NP VP MD VB Target should AmericansNNPS Trigger shouldMD knowVB that S Used T-Surgeon (Stanford NLP tools) to find trees that match templates and mark modality triggers and targets. NP wePRP VP canMD notRB handVB over NP PP DrNNP KhanNNP to them The Structure-Based English modality Tagger Modality Tagging
S NP VP-require AmericansNNPS MD-TrigRequire VB-TargRequire should know that S NP wePRP VP-NOTAble MD-TrigAble can RB-TrigNegation not VB-TargNOTAble handVB over NP PP DrNNP KhanNNP to them The Structure-Based English Modality Tagger T-surgeon Percolation
What was covered • 15 subcategorization patterns • 150 lemmas • Expressions of modality with lexical triggers
What wasn’t covered • Non-lexical modality • Imperatives • Other constructions • It will be a long time/a cold day in hell before… • Targets in coordinate structures • To do next • Word sense disambiguation • Can, must: deontic or epistemic • Manage: manage to do something vs manage a project • Transitivity alternations: alternate mappings between grammatical relations and semantic roles • The plan succeeded • The government succeeded in its plan. • The government succeeded ????
Evaluation: agreement between string-based and structure-based taggers • Calculated Kappa on the basis of 88108 sentences • from the English side of the Urdu-English corpus for MTEval 2009 • Example: • TargPermit (John is allowed to <TargPermit go> to NY) • 585 Matching Both taggers • 163 Matching just structure-based tagger • 194 Matching just string-based tagger • 87166 No match either tagger • Triggers: Kappa = .82 • Targets: Kappa = .76
Evaluation: Structure Based Tagger • Recall: not feasible to look for all expressions of modality that we didn’t tag. • No gold-standard annotated corpus. • Precision: • 249 sentences that were tagged with triggers and targets • From the English side of the MTEval 2009 training sentences • 86.3% correct • But ranges from about 82% to about 92% depending on genre
Precision: Errors • Light verb or noun is correct syntactic target but not the correct semantic target. • Earthquake affected areas in Pakistan will be provided the required number of tents and blankets by November 15. • The decision should be taken on delayed cases on the basis of merit. • Wrong word sense • In Bayas, Sikhs attacked a train under cover of night and killed everyone. • The process of provision of relief goods to needy people should be managed by the Army and the Edhi Trust. • Should be allowed to work like this in the future. • Like: succeed in something
Precision: Errors • Wrong subcategorization pattern. • The officials should consider themselves as servants of the people. • Coordinate Structures • Many large helicopters are needed to dispatch urgent relief materials to the many affected in far flung areas of the Neelam Valley and only America can help us in this regard.
Recall: what did we miss? • Special forms of negation • There was no place to seek shelter. • The buildings should be reconstructed, not with the RCC, but with the wood and steel sheets. • Constructional and phrasal triggers • President PervaizMusharraf has said that he will not rest unless the process of rehabilitation is completed. • Random lexical omissions • It is not possible in the middle of winter to re-open the roads.
S NP VP-require AmericansNNPS MD-TrigRequire VB-TargRequire should know that S NP wePRP VP-NOTAble MD-TrigAble can RB-TrigNegation not VB-TargNOTAble handVB over NP PP DrNNP KhanNNP to them SIMTSemantically Informed MT T-surgeon Percolation
Integration of the modality tagger with Syntax Based SMT • Joshua Syntax Based SMT system • Callison-Burch • Tag modalities on the English side of the training data. • Without modality tags: BLUE 26.4 • With modality tags: BLUE 26.7
Advantages of SIMT • Good for translation between a less commonly taught language and a common language • Modality can be analyzed on the common language and projected via word alignments to the LCTL • Depth of semantic analysis • Robustness of statistical approach
Summary • Modality annotation scheme • Modality lexicon • Automatic modality tagger • An method for integrating semantics into SMT • Good for translation between LCTLs and common languages
Future work • Improvements to the tagger • Add patterns for constructions without simple lexical triggers. • Word sense disambiguation (manage, attack, etc.) • Semantic composition of multiple modalities and negation. • Tagging of holders • Applications of the tagger • Further experiments with SIMT • Integration into tagger for Committed Belief (factivity)