1 / 23

Interpreting noun compounds using paraphrases

Interpreting noun compounds using paraphrases. András Dobó University of Oxford Stephen G. Pulman University of Oxford. Interpreting noun compounds using paraphrases. Motivation Related work Method Results Summary Future work. Motivation.

thanos
Download Presentation

Interpreting noun compounds using paraphrases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interpreting noun compounds using paraphrases András Dobó University of Oxford Stephen G. Pulman University of Oxford

  2. Interpreting noun compounds using paraphrases • Motivation • Related work • Method • Results • Summary • Future work

  3. Motivation • English is full of noun compounds, which are sequences of nouns acting as a single noun • Their interpretation is crucial for many NLP tasks • Using dictionaries is unfeasible • Automated methods

  4. Related work • Statistical approaches • Web queries or large corpora • Two main categories of methods • Inventory based approaches • Small number of abstract relational categories • Criticized for numerous reasons • Paraphrasing approaches • Verbs and prepositions as paraphrases • Water bottle = bottle that is for water be for

  5. Method • Paraphrasing method • Ranked list of paraphrases for each NC • Uses large corpora to search for paraphrases • Second noun is the head subject = second noun, object = first noun • Validates paraphrases using web queries • Two main approaches in the search of paraphrases

  6. Subject-paraphrase-object-triples • Counts the frequency of all (subject, paraphrase, object) triples in the corpus • Then for each NC it searches for those triples, where subject=second noun, object=first noun • List of suitable paraphrases for each NC • Ranks paraphrases for each NC using a scoring method based on their frequency

  7. Subject-paraphrase-and-paraphrase-object-pairs • Counts the frequency of all (subject, paraphrase) and (paraphrase, object) pairs in the corpus • Then for each NC it searches for those pairs, where subject=second noun, object=first noun • Two lists of paraphrases for each NC • Rank paraphrases for each NC using a scoring method based on their frequency

  8. Scoring methods • Subject-paraphrase-object-triples version: • Simply the frequency of the relevant (subject, paraphrase, object) triple • Subject-paraphrase-and-paraphrase-object-pairs version: • Using frequencies is not suitable • The product of pointwise the mutual information of the relevant (subject, paraphrase) and (paraphrase, object) pairs

  9. Used corpora and their preprocessing • Search for paraphrases: • British National Corpus • 100 million words • Grammatical relations from parser • Web 1T 5-gram Corpus • Generated from 1 trillion words of web page text • Grammatical relations from POS patterns • Noun verb determiner noun • Validation of paraphrases: • The Web through Google and Yahoo!

  10. Passive paraphrases • Their surface subject is actually their object • (subject, paraphrase)=(paraphrase2, object) • paraphrase: passive, without preposition • paraphrase2: active version of paraphrase • subject = object • Their frequencies are counted together

  11. Passive paraphrases • (subject, paraphrase, object)=(subject2, paraphrase2, object2) • paraphrase: passive, with by preposition • paraphrase2: active version of paraphrase, without preposition • object2 = subject • subject2 = object • Their frequencies are counted together • Such (paraphrase, object) and (subject2, paraphrase2) pairs are treated the same way

  12. Patientive ambitransitive verbs • Three main groups of verbs: strictly transitive, strictly intransitive, ambitransitive • Strictly intransitive verbs have two subclasses: unergative and unaccusative • Ambitransitive verbs have two subclasses too: agentive and patientive • Patientive ambitransitive verbs in intransitive use behave in the same way as passive verbs they are treated the same way

  13. Using synonyms, hypernyms, sister words etc. • No paraphrases are found for several NCs • Hypothesis: NCs comprising semantically similar words are interpreted the same way • Using semantically similar words in the search for paraphrases • Synonyms, hypernyms, sister words from WordNet • Semantically similar words that are automatically found with a method proposed by Dekang Lin

  14. Validation of paraphrases • Some paraphrases are incorrectValidation is needed • Hypothesis: If a paraphrase is suitable for a NC, then there should exist at least some web pages containing the NC paraphrased by that paraphrase

  15. Validation of paraphrases • Google and Yahoo! queries • Simple queries: “n2Infl THAT p n1Infl” • Extended queries: • Multiple verb tenses • Wildcard characters (up to 9) • Score for each paraphrase is recalculated

  16. Testing and evaluation • Tested on the first 50 NCs of the SemEval-2 Task #9 • 3 best paraphrases for each NC • 5 native speakers recruited for evaluation • They score each paraphrase from 1 to 5 • Their agreement was checked using Krippendorff’s alpha, and it was too low The (noun compound, paraphrase) pairs with highest disagreement were omitted

  17. Best version • Subject-paraphrase-object-triples version • Web 1T 5-gram Corpus • Combination of two basic versions: • No substitute words • Sister words • Scores are recalculated in a way that favors paraphrases returned by the first version • Validation: Google, present simple, up to 1 wildcard

  18. Results • Mixed performance • Average scores • Promising results given the difficulty of task

  19. Results Best scoring NCs Worst scoring NCs

  20. Future work • Parsing the Web 1T 5-gram Corpus Much lower error rate in obtaining the grammatical relations • Extended validation part • Employing synonyms, hypernyms, sister words or semantically similar words • Combining the different extensions

  21. Summary • Interpreting noun compounds is crucial for many NLP tasks • We presented a method for noun compound interpretation that searches for paraphrases in large corpora and issues web queries to validate the results • The results are promising, and could be further improved

  22. Acknowledgements • The attendance of this workshop was partly supported by the Hungarian National Office for Research and Technology within the framework of the R&D project MASZEKER (Modell-Alapú Szemantikus Kereső Rendszer – Model Based Semantic Search System).

  23. Thank you!

More Related