170 likes | 269 Views
English Proposition Bank: Status Report. Olga Babko-Malaya, Paul Kingsbury, Scott Cotton, Martha Palmer, Mitch Marcus March 25, 2003. Outline. Overview Status Report Mapping of Propbank Framesets to other sense distinctions. Example.
E N D
English Proposition Bank:Status Report • Olga Babko-Malaya, Paul Kingsbury, Scott Cotton, Martha Palmer, Mitch Marcus • March 25, 2003
Outline • Overview • Status Report • Mapping of Propbank Framesets to other sense distinctions
Example • He sent merchants around the country a form asking them to check one of three answers. Arg0: He REL: sent Arg2 : merchants around the country Arg1: a form asking them to check one of three answers.
Predicate-argument structure send Agent: HeGoal: merchants Theme: form NP1 NP2NP2 He sent merchants around the country a form asking them to check one of three answers.
Used At • MITRE, Xerox Parc, Sheffield University, BBN, Syracuse University, IBM, NYU, SRA, CMU, MIT, University of Texas at Dallas, University of Toronto, Columbia University, SPAWAR, and the JHU summer workshop. Also to JK Davis, John Josef Costandi, and Steve Maiorano. • Improvements in IE reported in ACL’03 Submission
Annotation procedure • Extraction of all sentences with given verb • First pass: Automatic tagging (Joseph Rosenzweig) http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon • Second pass: Double blind hand annotation • Third pass: adjudication Tagging tool highlights inconsistencies
Projected delivery dates • Financial subcorpus • alpha release: December, 2001--DONE! • beta release: July, 2002--DONE! • adjudicated release: summer 2003 • Propbank corpus • beta release: Summer 2003 • adjudicated release: December 2003
English PropBank - Current Status • 3183 frame files, corresponding to 3625 distinct predicates (including phrasal variants) - finished! • At least single annotated: 2915 verbs, 94.5K instances (80% of the TreeBank) • At least double annotated: 2250 verbs, 60K instances (67% of the Treebank) • Adjudicated: 1032 verbs, 25K instances (20% of the Treebank) • Coordinating with NYU on nominalizations – using Penn tagger and Frames files
Word Sense in Propbank • Original plan to ignore Word sense not feasible for 700+ verbs • Mary left the room • Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0:entity leaving Arg1:place left Frameset leave.02 "give": Arg0:giver Arg1:thing given Arg2:beneficiary How do these relate to traditional word senses as in WordNet?
Fine-grained WordNet Senses • Senseval 2 – WSD Bakeoff, usingWordNet 1.7 • Verb ‘Develop’ WN1: CREATE, MAKE SOMETHING NEW They developed a new technique WN2: CREATE BY MENTAL ACT They developed a new theory of evolution develop a better way to introduce crystallography techniques
WN Senses: verb ‘develop’ WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20
Sense Groups: verb ‘develop’ WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20
Propbank Framesets for verb ‘develop’ Frameset 1 (sense: create/improve) Arg0: agent Arg1: thing developed Example: They developed a new technique Frameset 2 (sense: come about) Arg1: non-intentional theme Example: The plot develops slowly
Mapping between Groups and Framesets Frameset2 Frameset1 WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20
Sense Hierarchy • Framesets – coarse grained distinctions • Sense Groups (Senseval-2) intermediate level (includes Levin classes) – 95% overlap • WordNet – fine grained distinctions
Sense-Tagging of Propbank • Sense tagging is primarily confined to the financial subcorpus, consists of about 90% of the polysemous instances in that corpus, and spans 415 verbs. • single tagged 12k polysemous instances with roleset identifiers. • double tagged 3k polysemous instances. • 94% agreement between annotators
Training Automatic Taggers • Stochastic tagger (Dan Gildea) • Results: Gold Standard parses 73.5 P, 71.7 R Automatic parses 59.0 P, 55.4 R • New results • Using argument labels as features for WSD • EM clustering for assigning argument labels