1 / 44

Different Sense Granularities

This article discusses the Japanese court's dismissal of a lawsuit filed by former Chinese slave workers seeking massive compensation. It highlights the sense tagging issues in machine translation and the impact on automatic word sense disambiguation.

juliavargas
Download Presentation

Different Sense Granularities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004

  2. Statistical Machine Translation results • CHINESE TEXT • The japanese court before china photo trade huge & lawsuit. • A large amount of the proceedings before the court dismissed workers. • japan’s court, former chinese servant industrial huge disasters lawsuit. • Japanese Court Rejects Former Chinese Slave Workers’ Lawsuit for Huge Compensation.

  3. Outline • MT example • Sense tagging Issues highlighted by Senseval1 • Senseval2 • Groupings, • Impact on ITA • Automatic WSD, impact on scores

  4. WordNet - Princeton • On-line lexical reference (dictionary) • Words organized into synonym sets <=> concepts • Hypernyms (ISA), antonyms, meronyms (PART) • Useful for checking selectional restrictions • (doesn’t tell you what they should be) • Typical top nodes - 5 out of 25 • (act, action, activity) • (animal, fauna) • (artifact) • (attribute, property) • (body, corpus)

  5. WordNet – president, 6 senses • president-- (an executive officer of a firm or corporation) -->CORPORATE EXECUTIVE, BUSINESS EXECUTIVE… LEADER 2. President of the United States, President, Chief Executive -- (the person who holds the office of head of state of the United States government; "the President likes to jog every morning")-->HEAD OF STATE, CHIEF OF STATE 3. president -- (the chief executive of a republic) -->HEAD OF STATE, CHIEF OF STATE 4. president, chairman, chairwoman, chair, chairperson -- (the officer who presides at the meetings of an organization; "address your remarks to the chairperson")--> PRESIDING OFFICER  LEADER 5. president -- (the head administrative officer of a college or university)--> ACADEMIC ADMINISTRATOR  …. LEADER 6. President of the United States, President, Chief Executive -- (the office of the United States head of state; "a President is elected every four years") --> PRESIDENCY, PRESIDENTSHIP  POSITION

  6. Limitations to WordNet • Poor inter-annotator agreement (73%) • Just sense tags - no representations • Very little mapping to syntax • No predicate argument structure • no selectional restrictions • No generalizations about sense distinctions • No hierarchical entries

  7. SIGLEX98/SENSEVAL • Workshop on Word Sense Disambiguation • 54 attendees, 24 systems, 3 languages • 34 Words (Nouns, Verbs, Adjectives) • Both supervised and unsupervised systems • Training data, Test data • Hector senses - very corpus based (mapping to WordNet) • lexical samples - instances, not running text • Replicability over 90%, ITA 85% ACL-SIGLEX98,SIGLEX99, CHUM00

  8. Hector - bother, 10 senses • 1. intransitive verb, - (make an effort), after negation, usually with to infinitive; (of a person) to take the trouble or effort needed (to do something). Ex. “About 70 percent of the shareholders did not bother to vote at all.” • 1.1 (can't be bothered), idiomatic, be unwilling to make the effort needed (to do something), Ex. ``The calculations needed are so tedious that theorists cannot be bothered to do them.'' • 2. vi; after neg; with `about" or `with"; rarely cont – (of a person) to concern oneself (about something or someone) “He did not bother about the noise of the typewriter because Danny could not hear it above the sound of the tractor.” • 2.1 v-passive; with `about" or `with“ - (of a person) to be concerned about or interested in (something) “The only thing I'm bothered about is the well-being of the club.”

  9. Mismatches between lexicons:Hector - WordNet, shake

  10. VERBNET

  11. VerbNet/WordNet

  12. Mapping WN-Hector via VerbNet SIGLEX99, LREC00

  13. SENSEVAL2 –ACL’01 Adam Kilgarriff, Phil Edmond and Martha Palmer All-words task Lexical sample task Czech Basque Dutch Chinese English English Estonian Italian Japanese Korean Spanish Swedish

  14. English Lexical Sample - Verbs • Preparation for Senseval 2 • manual tagging of 29 highly polysemous verbs (call, draw, drift, carry, find, keep, turn,...) • WordNet (pre-release version 1.7) • To handle unclear sense distinctions • detect and eliminate redundant senses • detect and cluster closely related senses NOT ALLOWED

  15. WordNet – call, 28 senses • name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; "Take two aspirin and call me in the morning") ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") -> LABEL

  16. WordNet – call, 28 senses 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER 5. shout, shout out, cry, call, yell, scream, holler, hollo, squall -- (utter a sudden loud cry; "she cried with pain when the doctor inserted the needle"; "I yelled to her from the window but she couldn't hear me") -> UTTER 6. visit, call in, call -- (pay a brief visit; "The mayor likes to call on some of the prominent citizens") -> MEET

  17. Groupings Methodology • Double blind groupings, adjudication • Syntactic Criteria (VerbNet was useful) • Distinct subcategorization frames • call him a bastard • call him a taxi • Recognizable alternations – regular sense extensions: • play an instrument • play a song • play a melody on an instrument

  18. Groupings Methodology (cont.) • Semantic Criteria • Differences in semantic classes of arguments • Abstract/concrete, human/animal, animate/inanimate, different instrument types,… • Differences in entailments • Change of prior entity or creation of a new entity? • Differences in types of events • Abstract/concrete/mental/emotional/…. • Specialized subject domains

  19. WordNet: - call, 28 senses WN2 , WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16 WN6 WN23 WN12 WN17 , WN 11 WN10, WN14, WN21, WN24

  20. WordNet: - call, 28 senses, groups WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16 WN6 WN23 WN12 WN17 , WN 11 WN10, WN14, WN21, WN24, Phone/radio Bird or animal cry Request Label Call a loan/bond Challenge Visit Loud cry Bid

  21. WordNet – call, 28 senses, Group1 • name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") --> LABEL 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") --> LABEL 19. call-- (consider or regard as being; "I would not call her beautiful")--> SEE 22. address, call -- (greet, as with a prescribed form, title, or name; "He always addresses me with `Sir'"; "Call me Mister"; "She calls him by first name") --> ADDRESS

  22. Sense Groups: verb ‘develop’ WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20

  23. Groups 1 and 2 of Develop

  24. Group 3 of Develop

  25. Group 4 of Develop

  26. Maximum Entropy WSDHoa Dang (in progress) • Maximum entropy framework • combines different features with no assumption of independence • estimates conditional probability that W has sense X in context Y, (where Y is a conjunction of linguistic features • feature weights are determined from training data • weights produce a maximum entropy probability distribution

  27. Features used • Topical contextual linguistic feature for W: • presence of automatically determined keywords in S • Local contextual linguistic features for W: • presence of subject, complements • words in subject, complement positions, particles, preps • noun synonyms and hypernyms for subjects, complements • named entity tag (PERSON, LOCATION,..) for proper Ns • words within +/- 2 word window

  28. Maximum Entropy WSDHoa Dang, Senseval2 Verbs (best) • Maximum entropy framework, p(sense|context) • Contextual Linguistic Features • Topical feature for W: +2.5%, • keywords(determined automatically) • Local syntactic features for W: +1.5 to +5%, • presence of subject, complements, passive? • words in subject, complement positions, particles, preps, etc. • Local semantic features for W: +6% • Semantic class info from WordNet (synsets, etc.) • Named Entity tag (PERSON, LOCATION,..) for proper Ns • words within +/- 2 word window

  29. Results - first 5 Senseval2 verbs

  30. Results – averaged over 28 verbs

  31. Grouping improved sense identification for MxWSD • 75% with training and testing on grouped senses vs. 43% with training and testing on fine-grained senses • Most commonly confused senses suggest grouping: • (1) name, call--assign a specified proper name to; ``They called their son David'' • (2) call--ascribe a quality to or give a name that reflects a quality; ``He called me a bastard''; • (3) call--consider or regard as being; ``I would not call her beautiful'' • (4) address, call--greet, as with a prescribed form, title, or name; ``Call me Mister''; ``She calls him by his first name''

  32. Criteria to split Framesets • Semantic classes of arguments, such as animacy vs. inanimacy Serve 01. Act, work • Group 1: function (His freedom served him well) • Group 2: work (He served in Congress)

  33. Criteria to split Framesets • Semantic type of event (abstract vs. concrete) See 01. View • Group 1: Perceive by sight (Can you see the bird?) • Group 5: determine, check (See whether it works)

  34. Overlap with PropBank Framesets WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13 WN6 WN23 WN28 WN17 , WN 11 WN10, WN14, WN21, WN24, Loud cry Bird or animal cry Request Label Call a loan/bond Challenge Visit Phone/radio Bid

  35. Overlap between Senseval2Groups and Framesets – 95% Frameset2 Frameset1 WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 develop

  36. Framesets →Groups→ WordNet Frameset3 Frameset1 Frameset2 WN11 WN1 WN2 WN9 WN8 WN3 WN4 WN12 WN5 WN16 WN18 WN14 WN7 WN15 WN10 WN6 WN13 drop

  37. Groups 1 and 2 of Develop

  38. Group 3 of Develop

  39. Translations of Develop groups

  40. Translations of Develop groups

  41. An Example of Mapping: verb ‘serve’Assignment: Do you agree?

  42. Frameset Tagging Results: overall accuracy 90%* (baseline 73.5%) * Gold Standard parses

  43. Sense Hierarchy • PropBank Framesets – ITA 94% coarse grained distinctions 20 Senseval2 verbs w/ > 1 Frameset Maxent WSD system, 73.5% baseline, 90% accuracy • Sense Groups (Senseval-2) - ITA 82% (now 89%) Intermediate level (includes Levin classes) – 69% • WordNet – ITA 71% fine grained distinctions, 60.2%

  44. Summary of WSD • Choice of features is more important than choice of machine learning algorithm • Importance of syntactic structure (English WSD but not Chinese) • Importance of dependencies • Importance of an hierarchical approach to sense distinctions, and quick adaptation to new usages.

More Related