1 / 16

The interface between model-theoretic and corpus-based semantics

The interface between model-theoretic and corpus-based semantics. Sebastian Pado. Natural language semantics. Model-theoretic semantics Compositional calculation of sentence meaning Formal descriptions of ambiguities Inference. Corpus-based semantics

adeola
Download Presentation

The interface between model-theoretic and corpus-based semantics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The interface between model-theoretic and corpus-based semantics Sebastian Pado

  2. Natural language semantics • Model-theoretic semantics • Compositional calculation of sentence meaning • Formal descriptions of ambiguities • Inference • Corpus-based semantics • Distributional, graded meaning representation • Probabilistic knowledge acquisition from corpora • Prediction of linguistic behaviour based on context

  3. Complementary benefits • Corpus-based • semantics • Good for lexical level • (open word classes) • High coverage, robustness • Approximative • Model-theoretic • semantics • Good for sentence level • (closed word classes) • Limited coverage • Correct How to divide work between the approaches?

  4. Strategies • More expressive representations for corpus-based models of meaning: Compositionality in vector spaces • Ongoing collaboration with Katrin Erk (Dept. of Linguistics, U. Texas at Austin) • Corpus-based methodsfor enrichment of formal meaning representations • Core of SFB project proposal

  5. Strategy 1 More expressive representations for corpus-based models of meaning

  6. Compositionality in Vector Spaces • Vector space: Representation of word meaning by context co-occurrences • What is the representation of a phrase? • Centroid of two vectors? • No: Must take mode of combination into account • “a horse draws…” : pull • “draw a horse” : sketch

  7. A first step • Structured vector space model [Erk & Pado 2008] • Covers Verb+Object, Verb+Subject combinations • Word meaning consists of lexical vector plus selectional preferences (=experiences) for dependents/governors

  8. A first step • Structured vector space model [Erk & Pado 2008] • Covers Verb+Object, Verb+Subject combinations • Phrase meaning consists of two vectors: • Verb meaning modified by nominal expectations about governor • Noun meaning modified by verbal expectations about dependent

  9. Current state • Evaluation: Better distinction between contextually appropriate and inappropriate paraphrases (WSD-style task) • Further research questions • Generalisation to longer phrases • More expressive model of expectations • Modelling of phrases involving closed word classes • E.g. Negation

  10. Strategy 2 Corpus-based methodsfor enrichment of formal meaning representations

  11. Formal models of meaning in context • Lexicon entries cannot provide the full range of readings for words/phrases • Readings often productively negotiated in text • Type/sort conflict • Examples: • Metonymy/Metaphor • Telic adjectives (“fast typist”) • Coercion/Reinterpretation

  12. Example: Coercion • Wegen einer 15-jährigen kam es zu einem Streit, in dessen Verlauf sie verletzt wurde. • […] Sie hatte sich mit einem 21-jährigen unterhalten. • Red and blue expressions are coreferring, but red expression has wrong type (wegen takes <e,t>; expression is <e>). • Here, context overtly provides missing event • Often, this is not the case: Operator must be recovered from general knowledge

  13. The role of corpus methods • Acquisition of general reinterpretation operators from corpora • Recovery/prediction of operators for instances with type/sort conflict • Making implicit meaning explicit:can be seen as context-driven semantic specification • Interest primarily empirical

  14. Project Steps • Creation of multilingual corpus of type/sort conflict cases with human annotations • Informed by formal considerations • Development of CL methods to predict operators for conflict resolution • Ideally, task-based evaluation (to be determined) • Consequences/insights for formal descriptions

  15. Research Questions • When can operators be found overtly in context; when must general operators be recovered? • Influence of local discourse? • CL methods for efficient and accurate prediction of operators • What linguistic levels are helpful? Semantic classes, semantic roles, dependency relations, …? • Focus on more than one language: Can bilingual processing help? • What is the level of generality of acquired operators? • What shape do people’s expectations have? • Do peoples’ judgments of recovered operators agree? • Can empirical results have impact on formal descriptions? • E.g. do sort and type conflicts behave differently or similarly? • Relation to work on textual entailment?

  16. Collaborations • D1 (Representation of ambiguities) • Formal descriptions as information source for corpus development • Attempt to transfer of empirical results back into theory • B5 (Polysemy in a conceptual system) • Ontological information as knowledge source for CL operator models • Entailment as shared evaluation task • Open for other ideas

More Related