1 / 25

Mono- and bilingual modeling of selectional preferences

Mono- and bilingual modeling of selectional preferences. Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin Erk , Ulrike Pado, Yves Peirsman ). Some context.

melosa
Download Presentation

Mono- and bilingual modeling of selectional preferences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with KatrinErk, Ulrike Pado, Yves Peirsman)

  2. Some context • Computational lexical semantics: modeling the meaning of words and phrases • Distributional approach • Observe the usage of words in corpora • Robustness: Broad coverage, manageable complexity • Flexibility: Corpus choice determines model Knowledge Corpus

  3. Structure Application: Predictions of plausibility judgments Methods: Distributional semantics Phenomena: Semanticrelations in bilingual dictionaries

  4. Plausibility of Verb-Relation-Argument-Triples • Central aspect of language • Selectional preferences [Katz & Fodor 1963, Wilks 1975] • Generalization of lexical similarity • Incremental language processing [McRae & Matsuki 2009] • Disambiguation [Toutanova et al. 2005], Applicability of inference rules [Pantel et al. 2007], SRL [Gildea & Jurafsky 2002]

  5. Modelling Plausibility • Approximating plausibility by frequency • Two lexical variables: Frequency of most triples is zero • Implausibility or sparse data? • Generalization based on an ontology (WordNet) [Resnik 1996] • Generalization based on vector space [Erk, Padó, und Padó 2010] (eat, obj, apple) 100 (eat, obj, hat) 1 (eat, obj, telephone) 0 (eat, obj, caviar) 0 English corpus (eat, obj, apple): highly plausible (eat, obj, hat): somewhat plausible (eat, obj, telephone): ? (eat, obj, caviar): ?

  6. Semantic Spaces cultiver mandarine Fr clémentine voiture rouler • Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998] • Geometrically: Vector in high-dimensional space • High vector similarity implies high semantic similarity • Next neighbors = synonyms

  7. Similarity-based generalization[Pado, Pado & Erk 2010] • Plausibility is average vector space similarity to seen arguments • (v, r, a): verb – relation – argument head word triple • seenargs: set of argument head words seen in the corpus • wt: weight function • Z: normalization constant • sim: semantic (vector space) similarity

  8. Geometrical interpretation apple telephone Seen objects of “eat” orange caviar breakfast Peter Seen subjects of “eat” husband child

  9. Evaluation • Triples with human plausibility ratings [McRae et al. 1996] • Evaluation: Correlation of model predictions with human judgments • Spearman’s ρ = 1: perfect correlation; ρ = 0: no correlation • Result: Vector space model attains almost quality of “deep” model at 98% coverage

  10. From one to many languages… • Vector space model reduces the need for language resources to predict plausibility judgments • No ontologies • Still necessary: Observations of triples, target words • Large, accurately parsed corpus • Problematic for basically all languages except English • Can we extend our strategy to new languages?

  11. Predicting plausibility for new languages Englishcorpus • Transfer with a bilingual lexicon [Koehn and Knight 2002] • Cross-lingual knowledge transfer • Print dictionaries are problematic • Instead: acquire from distributional data Englishmodel • cultiver – grow pomme– apple (cultiver, Obj, pomme) (grow, obj, apple): highly plausible

  12. Bilingual semantic space cultiver/grow mandarine E mandarin car Fr rouler/drive • Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997] • Dimensions are bilingual word pairs, can be bootstrapped • Frequencies observable from comparable corpora • Nearest neighbors: Cross-lingual synonyms ⟷Translations

  13. Nearest neighbors in bilingual space cultiver/grow pear E pomme car Fr rouler/drive • Similar usages / context profiles do not necessarily indicate synonymy • Bilingual case:Peirsman& Pado (2011) • Lexicon extraction for EN/DE and EN/NL

  14. Evaluation against Gold Standard • Evaluation of nearest cross-lingual neighbors against a translators’ dictionary

  15. Analysis of 200 noun pairs (EN-DE)

  16. Similarity by relation

  17. How to proceed? • Classical reaction: Focus on cross-lingual synonyms • Aggressive filtering of nearest-neighbor lists • Risk: Sparse data issues • Our hypothesis (prelimimary version): • Non-synonymous pairs still provide information about bilingual similarity • Should be exploited for cross-lingual knowledge transfer • Experimental validation: Vary number of synonyms, observe effect on cross-lingual knowledge transfer

  18. Varying the number of neighbors • Nearest neighbors: 50% of synonyms • Further neighbors: quick decline to 10% of synonyms

  19. Experimental setup English corpus • rouler – drive bagnole – jalopy, banger, car English model (bagnole, subj, rouler) Consider plausibilitiesfür: (jalopy, subj, drive) (banger, subj, drive) (car, subj, drive)

  20. Details • Model: • English model: trained on BNC as before • Bilingual lexicon extracted from BNC und StuttgarterNachrichtenkorpus HGC as comparable corpora • Prediction based on nnearest English neighboursforGerman argument • Evaluation: • 90 German (v,r,a) triples with human plausibility ratings [Brockmann & Lapata 2003]

  21. Results – EN-DE • Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included

  22. Results: Details

  23. Sources of the positive effect • Non-synonyms are in fact informative for plausibility translation • Semantically similar verbs: eat – munch – feast • Similar events, similar arguments [Fillmore et al. 2003, Levin 1993] • Semantically related verbs: peel – cook – eat • Schemas/narrative chains: shared participants [Shank & Abelson 1977, Chambers & Jurafsky 2009]

  24. Our hypothesis with qualifications • Using non-synonymous translation pairs is helpful • if transferred knowledge is lexical • Many infrequently observed datapoints • if knowledge is stable across semantically related/similar word pairs • Counterexample: polarity/sentiment judgments • food – feast – grub • Parallel experiment: best results for single nearest neighbor

  25. Summary • Plausibility can be modeled with fairly shallow methods • Seen head words plus generalization in vector space • Precondition: accurately parsed corpus • If unavailable: Transfer from better-endowed language • Translation through automatically induced lexicons • Transfer of knowledge about certain phenomena can benefit from non-synonymous translations • Corresponding to monolingual results from QA [Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …

More Related