180 likes | 329 Views
The interface between model-theoretic and corpus-based semantics. Sebastian Pado. Natural language semantics. Model-theoretic semantics Compositional calculation of sentence meaning Formal descriptions of ambiguities Inference. Corpus-based semantics
E N D
The interface between model-theoretic and corpus-based semantics Sebastian Pado
Natural language semantics • Model-theoretic semantics • Compositional calculation of sentence meaning • Formal descriptions of ambiguities • Inference • Corpus-based semantics • Distributional, graded meaning representation • Probabilistic knowledge acquisition from corpora • Prediction of linguistic behaviour based on context
Complementary benefits • Corpus-based • semantics • Good for lexical level • (open word classes) • High coverage, robustness • Approximative • Model-theoretic • semantics • Good for sentence level • (closed word classes) • Limited coverage • Correct How to divide work between the approaches?
Strategies • More expressive representations for corpus-based models of meaning: Compositionality in vector spaces • Ongoing collaboration with Katrin Erk (Dept. of Linguistics, U. Texas at Austin) • Corpus-based methodsfor enrichment of formal meaning representations • Core of SFB project proposal
Strategy 1 More expressive representations for corpus-based models of meaning
Compositionality in Vector Spaces • Vector space: Representation of word meaning by context co-occurrences • What is the representation of a phrase? • Centroid of two vectors? • No: Must take mode of combination into account • “a horse draws…” : pull • “draw a horse” : sketch
A first step • Structured vector space model [Erk & Pado 2008] • Covers Verb+Object, Verb+Subject combinations • Word meaning consists of lexical vector plus selectional preferences (=experiences) for dependents/governors
A first step • Structured vector space model [Erk & Pado 2008] • Covers Verb+Object, Verb+Subject combinations • Phrase meaning consists of two vectors: • Verb meaning modified by nominal expectations about governor • Noun meaning modified by verbal expectations about dependent
Current state • Evaluation: Better distinction between contextually appropriate and inappropriate paraphrases (WSD-style task) • Further research questions • Generalisation to longer phrases • More expressive model of expectations • Modelling of phrases involving closed word classes • E.g. Negation
Strategy 2 Corpus-based methodsfor enrichment of formal meaning representations
Formal models of meaning in context • Lexicon entries cannot provide the full range of readings for words/phrases • Readings often productively negotiated in text • Type/sort conflict • Examples: • Metonymy/Metaphor • Telic adjectives (“fast typist”) • Coercion/Reinterpretation
Example: Coercion • Wegen einer 15-jährigen kam es zu einem Streit, in dessen Verlauf sie verletzt wurde. • […] Sie hatte sich mit einem 21-jährigen unterhalten. • Red and blue expressions are coreferring, but red expression has wrong type (wegen takes <e,t>; expression is <e>). • Here, context overtly provides missing event • Often, this is not the case: Operator must be recovered from general knowledge
The role of corpus methods • Acquisition of general reinterpretation operators from corpora • Recovery/prediction of operators for instances with type/sort conflict • Making implicit meaning explicit:can be seen as context-driven semantic specification • Interest primarily empirical
Project Steps • Creation of multilingual corpus of type/sort conflict cases with human annotations • Informed by formal considerations • Development of CL methods to predict operators for conflict resolution • Ideally, task-based evaluation (to be determined) • Consequences/insights for formal descriptions
Research Questions • When can operators be found overtly in context; when must general operators be recovered? • Influence of local discourse? • CL methods for efficient and accurate prediction of operators • What linguistic levels are helpful? Semantic classes, semantic roles, dependency relations, …? • Focus on more than one language: Can bilingual processing help? • What is the level of generality of acquired operators? • What shape do people’s expectations have? • Do peoples’ judgments of recovered operators agree? • Can empirical results have impact on formal descriptions? • E.g. do sort and type conflicts behave differently or similarly? • Relation to work on textual entailment?
Collaborations • D1 (Representation of ambiguities) • Formal descriptions as information source for corpus development • Attempt to transfer of empirical results back into theory • B5 (Polysemy in a conceptual system) • Ontological information as knowledge source for CL operator models • Entailment as shared evaluation task • Open for other ideas