1 / 31

Tricks for Statistical Semantic Knowledge Discovery: A Selectionally Restricted Sample

Tricks for Statistical Semantic Knowledge Discovery: A Selectionally Restricted Sample. Marti A. Hearst UC Berkeley. Acquire Semantic Information. Goal:. Something on Finin. Tricks I Like. Unambiguous Cues. Lots o’ Text. Rewrite and Verify. Trick: Lots o’ Text.

maren
Download Presentation

Tricks for Statistical Semantic Knowledge Discovery: A Selectionally Restricted Sample

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tricks for Statistical Semantic Knowledge Discovery:A Selectionally Restricted Sample Marti A. Hearst UC Berkeley

  2. Acquire Semantic Information Goal:

  3. Something on Finin

  4. Tricks I Like Unambiguous Cues Lots o’ Text Rewrite and Verify

  5. Trick: Lots o’ Text • Idea: words in the same syntactic context are semantically related. • Hindle, ACL’90, “Noun classification from predicate-argument structure.”

  6. Trick: Lots o’ Text • Idea: words in the same syntactic context are semantically related. • Nakov & Hearst, ACL/HLT’08 “Solving Relational Similarity Problems Using the Web as a Corpus”

  7. Trick: Lots o’ Text • Idea: bigger is better than smarter! • Banko & Brill ACL’01: “Scaling to Very, Very Large Corpora for Natural Language Disambiguation”

  8. Trick: Lots o’ Text • Idea: apply web-scale n-grams to every problem imaginable. • Lapata & Keller, HLT/NACCL ‘04: “Web as a Baseline: Evaluating the Performance of Unsupervised Web-Based Models for a Range of NLP Tasks” = supervised > supervised MT candidate selection Noun compound bracketing Article suggestion Adjective ordering Noun compound interpretation

  9. Limitation • Sometimes counts alone are too ambiguous. Solution • Bootstrap from unambiguous contexts.

  10. Trick: Use Unambiguous Context • … to build statistics for ambiguous contexts. • Hindle & Rooth, ACL ’91“Structural Ambiguity and Lexical Relations” Example: PP attachment I eat spaghetti with sauce. Bootstrap from unambiguous contexts: Spaghetti with sauce is delicious. I eat with a fork.

  11. Trick: Use Unambiguous Context • … to identify semantic relations (lexico-syntactic contexts) • Hearst, COLING ’92, “Automatic Acquisition of Hyponyms from Large Text Corpora” Example: Hyponym Identification

  12. Combine Tricks 1 and 2

  13. Trick: Use Unambiguous Contexts + Lot’s O’ Text • Combine lexico-syntactic patterns with occurrence counts. • Kozareva, Riloff, Hovy, HLT-ACL’08. “Semantic Class learning form the Web with Hyponym Pattern Linkage Graphs”.

  14. Trick: Use Unambiguous Contexts + Lot’s O’ Text • Combine (usually) unambiguous surface patterns with occurrence counts. • Nakov & Hearst, HLT/EMNLP’05 “Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution”. • Left dash • cell-cycle analysis left • Possessive marker • brain’s stem cell  right • Parentheses • growth factor (beta)  left • Punctuation • heath care, provider  left • Abbreviation • tum. necr.(TN) factor  right • Concatenation • heathcare reform  left

  15. Trick: Use Unambiguous Contexts + Lot’s O’ Text • Identify a “protagonist” in each text to learn narrative structure • Chambers & Jurafsky, ACL’08 “Unsupervised Learning of Narrative Event Chains”.

  16. Trick 3: Rewrite & Verify

  17. Trick: Rewrite & Verify • Check if alternatives exist in text • Nakov & Hearst, HLT/EMNLP’05 “Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution”. • Example: NP bracketing • Prepositional • stem cellsin the brain right • stem cellsfrom the brain right • cellsfrom the brainstem left • Verbal • viruscausinghuman immunodeficiency left • painassociated witharthritis migraine left • Copula • office buildingthat is a skyscraper right

  18. Trick: Use Lexical Hierarchies • To improve generation of pseudo-words for WSD • Nakov & Hearst, HLT/NAACL’03, “Category-based Pseudo-Words” • To classify nouns in noun compounds and thus determine the semantic relations between them • Rosario, Hearst, & Fillmore, ACL’02, “Descent of Hierarchy and Selection in Relational Semantics” • To generate new (faceted) category systems • Stoica, Hearst, & Richardson, NAACL/HLT’07. “Automating Creation of Hierarchical Faceted Metadata Structures”

  19. Example: Recipes (3500 docs)

  20. Castanet Output(shown in Flamenco)

  21. Castanet Output

  22. Castanet Output

  23. Towards New Approaches to Semantic Analysis

  24. Ideas • Inducing Semantic Grammars • Boggess, Agarwal, & Davis, AAAI’91, “Disambiguation of Prepositional Phrases in Automatically Labelled Technical Text”

  25. Ideas • Use Cognitive Linguistics • Hearst, ’90,’92, “Direction-Based Text Interpretation”. • Talmy’s Force Dynamics + Reddy’s Conduit Metaphor  Path Model • Solves: Was the person in favor of or opposed to the idea:

  26. Using Cognitive Linguistics • Talmy’s Theory of Force Dynamics • Talmy, “Force Dynamics in Language and Thought,” in Parasession on Causatives and Agentivity, Chicago Linguistic Society 1985. • Describes how the interaction of agents with respect to force is lexically and grammatically expressed. • Posits two opposing entities: Agonist and Antagonist. • Each entity expresses an intrinsic force: towards rest or motion. • The balance of the strengths of the entities determines the outcome of the event. • Grammatical expression includes using a claused headed by “despite” to express a weaker antagonist.

  27. Using Cognitive Linguistics • Reddy’s Conduit Metaphor • Reddy, “The Conduit Metaphor – A Case of Frame Conflict in Our Language about Language,” in Metaphor and Thought, Ortony (Ed), Cambridge University Press, 1979. • A thought is schematized as an object which is placed by the speaker into a container that is sent along a conduit. • The receiver at the other end is the listener, who removes the objectified thought from the container and thus possesses it. • Inferences that apply to conduits can be applied to communication. • “Your meaning did not come through.” • “I can’t put this thought into words.” • “She is sending you some kind of message with that remark.”

  28. Using Cognitive Linguistics • Combine into the Path Model • Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text-based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992. • If an agent favors an entity or event, that agent can be said to desire the existence or “well-being” of that entity, and vice-versa. • Thus if an agent favors an entity’s triumph in a force-dynamic interaction, then the agent favors that entity or event. • But: force dynamics does not have the expressive power for a sequence. • Instead of focusing on the relative strength of two interacting entities, the model should represent what happens to a single entity through the course of its encounters with other entities. • Thus the entity can be schematized as if it were moving along a path toward some destination or goal.

  29. Using Cognitive Linguistics • The Path Model • Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text-based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992.

  30. Using Cognitive Linguistics • The Path Model • Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text-based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992.

  31. Using Cognitive Linguistics • The Path Model • Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text-based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992.

More Related