250 likes | 389 Views
Mono- and bilingual modeling of selectional preferences. Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin Erk , Ulrike Pado, Yves Peirsman ). Some context.
E N D
Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with KatrinErk, Ulrike Pado, Yves Peirsman)
Some context • Computational lexical semantics: modeling the meaning of words and phrases • Distributional approach • Observe the usage of words in corpora • Robustness: Broad coverage, manageable complexity • Flexibility: Corpus choice determines model Knowledge Corpus
Structure Application: Predictions of plausibility judgments Methods: Distributional semantics Phenomena: Semanticrelations in bilingual dictionaries
Plausibility of Verb-Relation-Argument-Triples • Central aspect of language • Selectional preferences [Katz & Fodor 1963, Wilks 1975] • Generalization of lexical similarity • Incremental language processing [McRae & Matsuki 2009] • Disambiguation [Toutanova et al. 2005], Applicability of inference rules [Pantel et al. 2007], SRL [Gildea & Jurafsky 2002]
Modelling Plausibility • Approximating plausibility by frequency • Two lexical variables: Frequency of most triples is zero • Implausibility or sparse data? • Generalization based on an ontology (WordNet) [Resnik 1996] • Generalization based on vector space [Erk, Padó, und Padó 2010] (eat, obj, apple) 100 (eat, obj, hat) 1 (eat, obj, telephone) 0 (eat, obj, caviar) 0 English corpus (eat, obj, apple): highly plausible (eat, obj, hat): somewhat plausible (eat, obj, telephone): ? (eat, obj, caviar): ?
Semantic Spaces cultiver mandarine Fr clémentine voiture rouler • Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998] • Geometrically: Vector in high-dimensional space • High vector similarity implies high semantic similarity • Next neighbors = synonyms
Similarity-based generalization[Pado, Pado & Erk 2010] • Plausibility is average vector space similarity to seen arguments • (v, r, a): verb – relation – argument head word triple • seenargs: set of argument head words seen in the corpus • wt: weight function • Z: normalization constant • sim: semantic (vector space) similarity
Geometrical interpretation apple telephone Seen objects of “eat” orange caviar breakfast Peter Seen subjects of “eat” husband child
Evaluation • Triples with human plausibility ratings [McRae et al. 1996] • Evaluation: Correlation of model predictions with human judgments • Spearman’s ρ = 1: perfect correlation; ρ = 0: no correlation • Result: Vector space model attains almost quality of “deep” model at 98% coverage
From one to many languages… • Vector space model reduces the need for language resources to predict plausibility judgments • No ontologies • Still necessary: Observations of triples, target words • Large, accurately parsed corpus • Problematic for basically all languages except English • Can we extend our strategy to new languages?
Predicting plausibility for new languages Englishcorpus • Transfer with a bilingual lexicon [Koehn and Knight 2002] • Cross-lingual knowledge transfer • Print dictionaries are problematic • Instead: acquire from distributional data Englishmodel • cultiver – grow pomme– apple (cultiver, Obj, pomme) (grow, obj, apple): highly plausible
Bilingual semantic space cultiver/grow mandarine E mandarin car Fr rouler/drive • Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997] • Dimensions are bilingual word pairs, can be bootstrapped • Frequencies observable from comparable corpora • Nearest neighbors: Cross-lingual synonyms ⟷Translations
Nearest neighbors in bilingual space cultiver/grow pear E pomme car Fr rouler/drive • Similar usages / context profiles do not necessarily indicate synonymy • Bilingual case:Peirsman& Pado (2011) • Lexicon extraction for EN/DE and EN/NL
Evaluation against Gold Standard • Evaluation of nearest cross-lingual neighbors against a translators’ dictionary
How to proceed? • Classical reaction: Focus on cross-lingual synonyms • Aggressive filtering of nearest-neighbor lists • Risk: Sparse data issues • Our hypothesis (prelimimary version): • Non-synonymous pairs still provide information about bilingual similarity • Should be exploited for cross-lingual knowledge transfer • Experimental validation: Vary number of synonyms, observe effect on cross-lingual knowledge transfer
Varying the number of neighbors • Nearest neighbors: 50% of synonyms • Further neighbors: quick decline to 10% of synonyms
Experimental setup English corpus • rouler – drive bagnole – jalopy, banger, car English model (bagnole, subj, rouler) Consider plausibilitiesfür: (jalopy, subj, drive) (banger, subj, drive) (car, subj, drive)
Details • Model: • English model: trained on BNC as before • Bilingual lexicon extracted from BNC und StuttgarterNachrichtenkorpus HGC as comparable corpora • Prediction based on nnearest English neighboursforGerman argument • Evaluation: • 90 German (v,r,a) triples with human plausibility ratings [Brockmann & Lapata 2003]
Results – EN-DE • Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included
Sources of the positive effect • Non-synonyms are in fact informative for plausibility translation • Semantically similar verbs: eat – munch – feast • Similar events, similar arguments [Fillmore et al. 2003, Levin 1993] • Semantically related verbs: peel – cook – eat • Schemas/narrative chains: shared participants [Shank & Abelson 1977, Chambers & Jurafsky 2009]
Our hypothesis with qualifications • Using non-synonymous translation pairs is helpful • if transferred knowledge is lexical • Many infrequently observed datapoints • if knowledge is stable across semantically related/similar word pairs • Counterexample: polarity/sentiment judgments • food – feast – grub • Parallel experiment: best results for single nearest neighbor
Summary • Plausibility can be modeled with fairly shallow methods • Seen head words plus generalization in vector space • Precondition: accurately parsed corpus • If unavailable: Transfer from better-endowed language • Translation through automatically induced lexicons • Transfer of knowledge about certain phenomena can benefit from non-synonymous translations • Corresponding to monolingual results from QA [Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …