1 / 26

English and Chinese PropBanks Martha Palmer University of Pennsylvania

English and Chinese PropBanks Martha Palmer University of Pennsylvania. with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic Representation Meeting University of Maryland. What is a PropBank?.

psyche
Download Presentation

English and Chinese PropBanks Martha Palmer University of Pennsylvania

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. English and Chinese PropBanksMartha PalmerUniversity of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic Representation Meeting University of Maryland

  2. What is a PropBank? • A PropBank is a corpus annotated with the predicate-argument structure of the verbs: • English Propbank: www.cis.upenn.edu/~ace3/’04 LDC Kingsbury and Palmer 2002, Palmer, Gildea, Kingsbury, 2005 • Wall Street Journal, 1M words, 120K+ predicate instances • Brown, 14K predicate instances • Chinese Propbank: www.cis.upenn.edu/~chinese/cpb Xue and Palmer 2003, Xue 2004 • Xinhua (250K words – almost done), • Sinorama (250K words – estimated 2007) • Nominalized verbs for English = NomBank/NYU • Chinese NomBank?

  3. Capturing “neutral” semantic roles • Boyan broke [ Arg1the LCD-projector.] break (agent(Boyan), patient(LCD-projector)) • [Arg1 The windows] were broken by the hurricane. • [Arg1 The vase] broke into pieces when it toppled over

  4. Frames File example: give< 4000 Frames for PropBank Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefsa standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

  5. Frames File example: givew/ Thematic Role Labels Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefsa standing ovation. Arg0: Agent The executives REL: gave Arg2: Recipient the chefs Arg1: Theme a standing ovation VerbNet – based on Levin classes

  6. PropBank Exercise Ex. • [He]-Arg1Theme[will]-MOD [probably]-MOD be [extradited]-rel[to the U.S]-DIR[for trial under an extradition treaty President Virgilia Barco has revived]-PRP.  • He will probably be extradited to the U.S for trial under [an extradition treaty]-Arg1Theme[President Virgilia Barco]-Arg0Agent has [revived]-rel. 

  7. A Chinese Treebank Sentence 国会/Congress 最近/recently通过/pass 了/ASP银行法/banking law “The Congress passed the banking law recently.” (IP (NP-SBJ (NN 国会/Congress)) (VP (ADVP (ADV 最近/recently)) (VP (VV 通过/pass) (AS 了/ASP) (NP-OBJ (NN 银行法/banking law)))))

  8. 通过(f2) (pass) arg0 argM arg1 国会 最近银行法(law) (congress) (IP (NP-SBJ arg0 (NN 国会)) (VP argM (ADVP (ADV 最近)) (VP f2 (VV 通过) (AS 了) arg1 (NP-OBJ (NN 银行法))))) The Same Sentence, PropBanked

  9. Annotation procedure • PTB II – Extract all sentences of a verb • Create Frame File for that verb Paul Kingsbury (3400+ lemmas, 4700 framesets,120K predicates) • 1st pass: Automatic tagging Joseph Rosenzweig • 2nd pass: Double blind hand correction by verb Inter-annotator agreement 84% (87% Arg#’s) • 3rd pass: Adjudication Olga Babko-Malaya • 4th pass: Train automatic semantic role labellers Dan Gildea, Sameer Pradhan, Nianwen Xue, Szuting Yi, …. CoNLL-04 shared task, 2004, 2005, ….

  10. Propbank Kappa Statistics • Role identification • classifying tree nodes as argument vs. non-argument • Role classification • classifying arguments as Arg1 vs. Arg2 vs ArgM-LOC vs. etc… • Kappa = P(A) - P(E) / 1 - P(E)

  11. Throughput • Framing: approximately 80-100 verbs/week • Annotation: approximately 70 instances/hour • Solomonization: approximately 100 instances per hour • 100K words (last summer) • ~4 months (hardly any new frame files) • 4-6 part-time annotators (100hrs a week), • half-time programmer, • half-time project manager, • half-time adjudicator, frame file creator

  12. Applications • IE – slot filling • Question Answering: • What do lobsters like to eat? • Answer is NOT people! • Machine Translation • Reconciling event descriptions across languages - See Parallel Prop II

  13. Word Senses in PropBank • Orders to ignore word sense not feasible for 700+ verbs • Mary left the room • Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in WordNet?

  14. Overlap between Senseval2Groups and Framesets – 95% Frameset2 Frameset1 WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 develop

  15. Sense Hierarchy (Palmer, et al, SNLU04 - NAACL04) • PropBank Framesets – ITA >90% coarse grained distinctions 20 Senseval2 verbs w/ > 1 Frameset Maxent WSD system, 73.5% baseline, 90% accuracy • Sense Groups (Senseval-2) - ITA 82% Intermediate level (includes Levin classes) – 69% • WordNet – ITA 71% fine grained distinctions, 60.2% Tagging w/groups, ITA 89%, 200@hr

  16. PropBank II – English/Chinese (100K) We still need relations between events and entities: • Event ID’s with event coreference • Selective sense tagging • Tagging nominalizations w/ WordNet sense • Grouped WN senses - selected verbs and nouns • Nominal Coreference • not names • Clausal Discourse connectives – selected subset Level of representation that reconciles many surface differences between the languages

  17. Event IDs – Parallel Prop II (1) • Aspectual verbs do not receive event IDs: • 今年/this year 中国/China 继续/continue 发挥/play 其/it 在/at 支持/support 外商/foreign business 投资/investment 企业/enterprise 方面/aspect 的/DE 主/main 渠道/channel 作用/role “This year, the Bank of China will continue to play the main role in supporting foreign-invested businesses.”

  18. Event IDs – Parallel Prop II (2) • Nominalized verbs do: • He will probably be extradited to the US for trial. done as part of sense-tagging (all 7 WN senses for “trial” are events.) • 随着/with 中国/China 经济/economy 的/DE 不断/continued 发展/development… “With the continued development of China’s economy…” The same events may be described by verbs in English and nouns in Chinese, or vice versa. Event IDs help to abstract away from POS tag

  19. Event reference – Parallel Prop II • Pronouns (overt or covert) that refer to events: [This] is gonna be a word of mouth kind of thing. 这些/these 成果/achivements 被/BEI 企业/enterprise 用/apply (e15) 到/to 生产/production 上/on 点石成金/spin gold from straw, *pro*-e15 大大/greatly 提高/improve 了/le 中国/China 镍/nickel 工业/industry 的/DE 生产/production 水平/level 。 “These achievements have been applied (e15) to production by enterprises to spin gold from straw, which-e15 greatly improved the production level of China’s nickel industry.” • Prerequisites: • pronoun classification • free trace annotation

  20. Chinese PB II: Sense tagging • Much lower polysemy than English • Avg of 3.5 (Chinese) vs. 16.7 (English) Dang, Chia, Chiou, Palmer, COLING-02 • More than 2 Framesets 62/4865 (250K) Ch vs. 294/3635 (1M) English • Mapping Grouped English senses to Chinese (English tagging - 93 verbs/168 nouns, 5000+ instances) • Selected 12 polysemous English words (7 verbs/5 nouns) • For 9 (6 verbs/3 nouns), grouped English senses map to unique Chinese translation sets (synonyms)

  21. Mapping of Grouped Sense Tagsto Chinese lift, elevate, orient upwards 仰 / yang3 increase 提高 / ti2gao1 Collect, levy 募集 / mu4ji2 筹措 / chou2cuo4 筹... / chou2… invoke, elicit, set off 提 / ti4 raise – translations by group

  22. Discourse connectives: The Penn Discourse TreeBank • WSJ corpus (~1M words, ~2400 texts) http://www.cis.upenn.edu/~pdtb Miltsakaki, Prasad, Joshi and Webber, LREC-04, NAACL-04 Frontiers Prasad, Miltsakaki, Joshi and Webber ACL-04 Discourse Annotation • Chinese: 10 explicit discourse connectives that include subordination conjunctions, coordinate conjunctions, and discourse adverbials. • Argument determination, sense disambiguation [arg1 学校/school 不/not 教/teach 理财/finance management], [conn结果/as a result] [arg2 报章/newspaper 上/on 的/DE 各/all 种/kind 专栏/column 就/then 成为/become 信息/information 的/DE 主要/main 来源/source]。 “The school does not teach finance management. As a result, the different kinds of columns become the main source of information.”

  23. Summary of English PropBanksOlga Babko-Malaya, Ben Snyder *DOD funding

  24. Annotation of free traces • Free traces – traces which are not linked to an antecedent in PropBank • Arbitrary Legislation to lift the debt ceiling is ensnarled in the fight over [*]–ARB cutting capital-gains taxes • Event The department proposed requiring (e4) stronger roofs for light trucks and minivans , [*]-e4 beginning with 1992 models • Imperative All right, [*]-IMP shoot. • 1K instances of free traces in a 100K corpus

  25. Classification of pronouns • 'referring' [John Smith] arrived yesterday. [He] said that... • ‘bound' [Many companies] raised [their] payouts by more than 10% • ‘event‘ [This] is gonna be a word of mouth kind of thing. • ‘generic' I like [books]. [They] make me smile.

  26. Mapping of Grouped Sense Tagsto Chinese • Zhejiang|浙江zhe4jiang1 will|将jiang1 raise|提高ti2gao1the level|水平shui3ping2 of|的de opening up|开放kai1fang4 to|对dui4 the outside world|外wai4. (浙江将提高对外开放的水平。) • I|我wo3 raised|仰yang3 my|我的wo3de head|头tou2 in expectation|期望qi1wang4.(我仰头望去。) • …, raising|筹措chou2cuo4 funds|资金zi1jin1 of|的de 15 billion|150亿yi1ban3wu3shi2yi4 yuan|元yuan2 (…筹措资金150亿元。) • The meeting|会议hui4yi4 passed|通过tong1guo4 the “decisionregarding motions”|议案yi4an4 raised|提ti4 by 32 NPC|人大ren2da4 representatives|代表dai4biao3 (会议通过了32名人大代表所提的议案。)

More Related