630 likes | 751 Views
Putting Meaning into Your Trees. Martha Palmer CIS630 September 13, 2004. Meaning? . Complete representation of real world knowledge - Natural Language Understanding? NLU Only build useful representations for small vocabularies
E N D
Putting Meaning into Your Trees Martha Palmer CIS630 September 13, 2004
Meaning? • Complete representation of real world knowledge - Natural Language Understanding? NLU • Only build useful representations for small vocabularies • Major impediment to accurate Machine Translation, Information Retrieval and Question Answering
Outline • Introduction • Background: WordNet, Levin classes, VerbNet • Proposition Bank • Captures shallow semantics • Associated lexical frame files • Supports training of an automatic tagger • Mapping PropBank to VerbNet • Mapping PropBank to WordNet • Future directions
Ask Jeeves – A Q/A, IR ex. What do you call a successful movie? • Tips on Being a Successful Movie Vampire ... I shall call the police. • Successful Casting Call & Shoot for ``Clash of Empires'' ... thank everyone for their participation in the making of yesterday's movie. • Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague... • VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer. Blockbuster
Ask Jeeves – filtering w/ POS tag What do you call a successful movie? • Tips on Being a Successful Movie Vampire ... I shall call the police. • Successful Casting Call & Shoot for ``Clash of Empires'' ... thank everyone for their participation in the making of yesterday's movie. • Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague... • VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer.
Filtering out “call the police” Different senses, - different syntax, - different participants call(you,movie,what) ≠ call(you,police) you movie what you police
Machine Translation Lexical Choice- Word Sense Disambiguation • Iraq lost the battle. • Ilakuka centwey ciessta. • [Iraq ] [battle] [lost]. • John lost his computer. • John-i computer-lul ilepelyessta. • [John] [computer] [misplaced].
Cornerstone: English lexical resource • That provides sets of possible syntactic frames for verbs. • And provides clear, replicable sense distinctions. AskJeeves: Who do you call for a good electronic lexical database for English?
WordNet – Princeton (Miller 1985, Fellbaum 1998) On-line lexical reference (dictionary) • Nouns, verbs, adjectives, and adverbs grouped into synonym sets • Other relations include hypernyms (ISA), antonyms, meronyms • Typical top nodes - 5 out of 25 • (act, action, activity) • (animal, fauna) • (artifact) • (attribute, property) • (body, corpus)
WordNet – call, 28 senses • name, call -- (assign a specified, proper name to; "They named their son David"; …) -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; …) ->TELECOMMUNICATE 3. call-- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; …) -> LABEL 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER
WordNet – Princeton (Miller 1985, Fellbaum 1998) • Limitations as a computational lexicon • Contains little syntactic information • Comlex has syntax but no sense distinctions • No explicit lists of participants • Sense distinctions very fine-grained, • Definitions often vague • Causes problems with creating training data for supervised Machine Learning – SENSEVAL2 • Verbs > 16 senses (including call) • Inter-annotator Agreement ITA 73%, • Automatic Word Sense Disambiguation, WSD 60.2% Dang & Palmer, SIGLEX02
WordNet: - call, 28 senses WN2 , WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16 WN6 WN23 WN12 WN17 , WN 11 WN10, WN14, WN21, WN24
WordNet: - call, 28 senses, Senseval2 groups (engineering!) WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13 WN6 WN23 WN28 WN17 , WN 11 WN10, WN14, WN21, WN24, Loud cry Bird or animal cry Request Label Call a loan/bond Challenge Visit Phone/radio Bid
Grouping improved scores:ITA 82%, MaxEnt WSD 69% • Call: 31% of errors due to confusion between senses within same group 1: • name, call -- (assign a specified, proper name to; They named their son David) • call -- (ascribe a quality to or give a name of a common noun that reflects a quality; He called me a bastard) • call -- (consider or regard as being;I would not call her beautiful) • 75% with training and testing on grouped senses vs. • 43% with training and testing on fine-grained senses Palmer, Dang, Fellbaum,, submitted, NLE
Groups • Based on VerbNet, an English Lexical resource that is under development, • Which is in turn based on Levin’s English verb classes….
Levin classes (Levin, 1993) • 3100 verbs, 47 top level classes, 193 second and third level • Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. John cut the bread. / *The bread cut. / Bread cuts easily. John hit the wall. / *The wall hit. / *Walls hit easily.
Levin classes (Levin, 1993) • Verb class hierarchy: 3100 verbs, 47 top level classes, 193 • Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. change-of-state John cut the bread. / *The bread cut. / Bread cuts easily. change-of-state, recognizable action, sharp instrument John hit the wall. / *The wall hit. / *Walls hit easily. contact, exertion of force
Limitations to Levin Classes • Coverage of only half of the verbs (types) in the Penn Treebank (1M words,WSJ) • Usually only one or two basic senses are covered for each verb • Confusing sets of alternations • Different classes have almost identical “syntactic signatures” • or worse, contradictory signatures Dang, Kipper & Palmer, ACL98
Multiple class listings • Homonymy or polysemy? • draw a picture, draw water from the well • Conflicting alternations? • Carry verbs disallow the Conative, (*she carried at the ball), but include {push,pull,shove,kick,yank,tug} • also in Push/pull class, does take the Conative (she kicked at the ball)
Intersective Levin Classes Dang, Kipper & Palmer, ACL98 “apart” CH-STATE “across the room” CH-LOC “at” ¬CH-LOC
Intersective Levin Classes • More syntactically and semantically coherent • sets of syntactic patterns • explicit semantic components • relations between senses • VERBNET www.cis.upenn.edu/verbnet
VerbNet – Karin Kipper • Class entries: • Capture generalizations about verb behavior • Organized hierarchically • Members have common semantic elements, semantic roles and syntactic frames • Verb entries: • Refer to a set of classes (different senses) • each class member linked to WN synset(s) (not all WN senses are covered) Dang, Kipper & Palmer, IJCAI00, Coling00
Semantic role labels: Grace broke the LCD projector. break (agent(Grace), patient(LCD-projector)) cause(agent(Grace), change-of-state(LCD-projector)) (broken(LCD-projector)) agent(A) -> intentional(A), sentient(A), causer(A), affector(A) patient(P) -> affected(P), change(P),…
VerbNet entry for leaveLevin class: future_having-13.3 • WordNet Senses: leave, (WN 2,10,13), promise, offer, …. • Thematic Roles: Agent[+animate OR +organization] Recipient[+animate OR +organization] Theme[] • Frameswith Semantic Roles "I promised somebody my time" Agent V Recipient Theme “I left my fortune to Esmerelda" Agent V Theme Prep(to) Recipient ) "I offered my services" Agent V Theme
PropBank Handmade resources vs. Real data • VerbNet is based on linguistic theory – how useful is it? • How well does it correspond to syntactic variations found in naturally occurring text?
Powell met Zhu Rongji battle wrestle join debate Powell and Zhu Rongji met consult Powell met with Zhu Rongji Proposition:meet(Powell, Zhu Rongji) Powell and Zhu Rongji had a meeting Proposition Bank:From Sentences to Propositions (Predicates!) meet(Somebody1, Somebody2) . . . When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane))
Capturing semantic roles* • Jerry broke [ PATIENTthe laser pointer.] • [PATIENT The windows] were broken by the hurricane. • [PATIENT The vase] broke into pieces when it toppled over. SUBJ SUBJ SUBJ
Capturing semantic roles* • Jerry broke [ ARG1 the laser pointer.] • [ARG1 The windows] were broken by the hurricane. • [ARG1 The vase] broke into pieces when it toppled over. *See also Framenet, http://www.icsi.berkeley.edu/~framenet/
NP a GM-Jaguar pact NP NP the US car maker NP an eventual 30% stake in the British company A TreeBanked phrase A GM-Jaguar pact would give the U.S. car maker an eventual 30% stake in the British company. S VP VP would NP give PP-LOC
a GM-Jaguar pact give(GM-J pact, US car maker, 30% stake) The same phrase, PropBanked A GM-Jaguar pact would give the U.S. car maker an eventual 30% stake in the British company. Arg0 would give Arg1 an eventual 30% stake in the British company Arg2 the US car maker
Frames File example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefsa standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation
Annotation procedure • PTB II – Extract all sentences of a verb • Create Frame File for that verb Paul Kingsbury • (3100+ lemmas, 4700 framesets,120K predicates) • First pass: Automatic tagging Joseph Rosenzweig • Second pass: Double blind hand correction • Inter-annotator agreement 84% • Third pass: Solomonization (adjudication) • Olga Babko-Malaya
Trends in Argument Numbering • Arg0 = proto-typical agent (Dowty) • Arg1 = proto-typical patient • Arg2 = indirect object / benefactive / instrument / attribute / end state • Arg3 = start point / benefactive / instrument / attribute • Arg4 = end point
Additional tags (arguments or adjuncts?) • Variety of ArgM’s (Arg#>4): • TMP - when? • LOC - where at? • DIR - where to? • MNR - how? • PRP -why? • REC - himself, themselves, each other • PRD -this argument refers to or modifies another • ADV –others
Inflection, etc. • Verbs also marked for tense/aspect • Passive/Active • Perfect/Progressive • Third singular (is has does was) • Present/Past/Future • Infinitives/Participles/Gerunds/Finites • Modals and negations marked as ArgMs
PropBank/FrameNet Buy Arg0:buyer Arg1:goods Arg2:seller Arg3:rate Arg4:payment Sell Arg0:seller Arg1:goods Arg2:buyer Arg3:rate Arg4:payment Broader, more neutral, more syntactic – maps readily to VN,TR,FN Rambow, et al, PMLB03
Outline • Introduction • Background: WordNet, Levin classes, VerbNet • Proposition Bank • Captures shallow semantics • Associated lexical frame files • Supports training of an automatic tagger • Mapping PropBank to VerbNet • Mapping PropBank to WordNet
Approach • Pre-processing: • A heuristic which filters out unwanted constituents with significant confidence • Argument Identification • A binary SVM classifier which identifies arguments • Argument Classification • A multi-class SVM classifier which tags arguments as ARG0-5, ARGA, and ARGM
Automatic Semantic Role Labeling Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02 Stochastic Model • Basic Features: • Predicate, (verb) • Phrase Type, (NP or S-BAR) • Parse Tree Path • Position (Before/after predicate) • Voice (active/passive) • Head Word of constituent • Subcategorization
Discussion Part I – Szuting Yi • Comparisons between Pradhan and Penn (SVM) • Both systems are SVM-based • Kernel: Pradhan uses a degree 2 polynomial kernel; Penn uses a degree 3 RGB kernel • Multi-classification: Pradhan uses a one-versus-others approach; Penn uses a pairwise approach • Features: Pradhan includes rich features including NE, head word POS, partial path, verb classes, verb sense, head word of PP, first or last word/pos in the constituent, constituent tree distance, constituent relative features, temporal cue words, dynamic class context (Pradhan et al, 2004)
Xue & Palmer, EMNLP04 Discussion Part II • Different features for different subtasks • Basic features analysis
Discussion Part III (New Features – Bert Xue) • Syntactic frame • use NPs as “pivots” • varying with position within the frame • lexicalization with predicate • Predicate + • head word • phrase type • head word of PP parent • Position + voice
Word Senses in PropBank • Orders to ignore word sense not feasible for 700+ verbs • Mary left the room • Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in VerbNet and WordNet?
Frames: Multiple Framesets • Out of the 787 most frequent verbs: • 1 Frameset – 521 • 2 Frameset – 169 • 3+ Frameset - 97 (includes light verbs) • 90% ITA • Framesets are not necessarily consistent between different senses of the same verb • Framesets are consistent between different verbs that share similar argument structures, (like FrameNet)
Ergative/Unaccusative Verbs Roles (no ARG0 for unaccusative verbs) Arg1= Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4= end point Sales rose 4% to $3.28 billion from $3.16 billion. The Nasdaq composite index added 1.01 to 456.6 on paltry volume.
Mapping from PropBank to VerbNet • Overlap with PropBank framesets • 50,000 PropBank instances • < 50% VN entries, > 85% VN classes • Results • MATCH - 78.63%. (80.90% relaxed) • (VerbNet isn’t just linguistic theory!) • Benefits • Thematic role labels and semantic predicates • Can extend PropBank coverage with VerbNet classes • WordNet sense tags Kingsbury & Kipper, NAACL03, Text Meaning Workshop http://www.cs.rochester.edu/~gildea/VerbNet/