1 / 45

IJCNLP2005-paraphrasing

IJCNLP2005-paraphrasing. Weigang LI 2005-10-21. Jeju, The Republic of Korea. Outline. Paraphrasing in Main Conference Paraphrasing in Workshop Harvest in IJCNLP Pities in IJCNLP Conclusions. Paraphrasing in Main Conference.

Download Presentation

IJCNLP2005-paraphrasing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IJCNLP2005-paraphrasing Weigang LI 2005-10-21

  2. Jeju, The Republic of Korea

  3. Outline • Paraphrasing in Main Conference • Paraphrasing in Workshop • Harvest in IJCNLP • Pities in IJCNLP • Conclusions

  4. Paraphrasing in Main Conference • Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web • Web-Based Unsupervised Learning for Query Formulation in Question Answering • Exploiting Lexical Conceptual Structure for Paraphrase Generation

  5. Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

  6. Basic Information • Author: Marius Pasca and Peter Dienes • Affiliation: Google Inc. • Main Idea • IF two sentence fragments have common word sequences at both extremities, then the variable word sequences in the middle are potential paraphrases of each other • A significant advantage of this extraction mechanism is that it can acquire paraphrases from sentences whose information content overlaps only partially, as long as the fragments align

  7. An example

  8. Pre-processing • Filtering out HTML tags, POS • Words number: 5 < n < 30 • At least one verb • At least one noun word starts in lowercase • Every word length less than 30 • Less than half words are numbers More than One Billion sentences

  9. Algorithm

  10. Problem of this method • An example • “decided to read the government report published last month” • “decided to read the edition published last month” • How to avoid this problem?

  11. Alignment Anchors • Ngram-Only • Ngram-Entity • Preceding and following named entities, here, just use the noun • Ngram-Relative • Several lexico-syntactic patterns

  12. Results

  13. Web-Based Unsupervised Learning for Query Formulation in Question Answering

  14. Basic Information • Author: Yi-Chia Wang, Jian-Cheng Wu, Tyne Liang, and Jason S. Chang • Affiliation: National Chiao Tung University, National Tsing Hua University, • Query Formulation

  15. Main Idea • Training-data: questions are classified into a set of fine-grained categories of question patterns • Using a word alignment technique: the relationships between the question patterns and n-grams in answer passages are discovered • Finally, the best query transforms are derived by ranking the n-grams which are associated with a specific question pattern

  16. Transforming Question to Query • Search the Web for Relevant Answer Passages • Question Pattern Extraction • Some rules are manually made • Learning Best Transforms • Word Alignment Across Q and AP • SMT aligned Technology to apply qi and ai (bigram) • Select top k bigrams, t1, t2,.., tk, for every question pattern or keyword q • Distance Constraint and Proximity Ranks (between bigrams and answer) • Combing Alignment and Proximity Ranks

  17. Runtime Transformation of Questions • Pre-processing • Classified according to the rules • According to the training result to select the top bigrams (or’s) • Query conjunction

  18. Experiments • Training corpus • 3806 Q-A pairs • 338 question patterns, 95,926 answer passages • 45 questions as test corpus

  19. Result

  20. Outline • Paraphrasing in Main Conference • Paraphrasing in Workshop • Harvest in IJCNLP • Pities in IJCNLP • Conclusions

  21. Basic information about workshop • Total: 12 published papers • 3 papers from USA (2 of them are from MSR, 1 of them from New York University) • 5 papers from Japan(3 of them are from ATR, 1 from Nagaoka U., 1 from Kyoto U.) • 2 papers from UK (The open University) • 1 paper from Australia (Macquarie U.) • 1 paper from China (HIT)

  22. 3 sessions • phrase-level • Automatic paraphrase discovery based on context and keywords between NE pairs • Sentence-level • Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation • discourse-level • Support vector machines for paraphrase identification and corpus construction

  23. Automatic paraphrase discovery based on context and keywords between NE pairs • Author: Satoshi Sekine • Affiliation: New York University • Task: Aim to extract the phrases between two NEs as paraphrases

  24. Overview • NE taggers (140 NE catatories, rule-based system) • Gather instances with NEs • C1-C2 domain with topic keywords using TF/ITF, and the same keywords are clustered together • Phrases linked individual NEs as paraphrases in the same domain

  25. Experiments • 0.63 million instances with NE pairs • Total: 2,000 NE category pairs, 5184 keywords • 13,976 phrases with keywords

  26. Results

  27. Limitations • Just one keywords • Not using any structural information • The chunks number is less 5 between two NEs, can’t process long distance problem

  28. Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation • Author: Yves Lepage and Etienne Denoual • Affiliation: ATR - Spoken language communication research labs • Task: To produce reference sentences using machine translation evaluation

  29. Algorithm • Detection: find sentences which share a same translation in the multilingual resource • Generation: produce new sentences by exploiting commutations; limit combinatorics by contiguity constraints

  30. Results • The lower the scores, the better the lexical and syntactical • variation

  31. Support vector machines for paraphrase identification and corpus construction • Author: Chris Brockett and William B. Dolan • Affiliation: Natural Language Processing Group ,Microsoft Research • Task: Paraphrase Identification and Corpus Construction

  32. Background • Paraphrasing  SMT • How to construct large scale paraphrase corpora, it’s a very hard task! • Annotated datasets • Using SVM to induce larger monolingual paraphrase corpora

  33. Datasets • Randomly select 10,000 sentence pairs • Hand-tagged (1or 0) • 2968 positive and 7032 negative examples

  34. Features • Total: 264,543 features • After filtering: less than 1000 features • String Similarity Features • Morphological Variants • WordNet Lexical Mappings • Word Association Pairs • Composite Features

  35. Results

  36. Outline • Paraphrasing in Main Conference • Paraphrasing in Workshop • Harvest in IJCNLP • Pities in IJCNLP • Conclusions

  37. Outline • Paraphrasing in Main Conference • Paraphrasing in Workshop • Thought on Future Work of Paraphrasing • Harvest in IJCNLP • Pities in IJCNLP • Conclusions

  38. The Biggest Harvest • Know many fellow people • Old • Young • Man • Women (few)

  39. Other harvest • Beautiful prospect • Beautiful food • Beautiful show • Beautiful people

  40. Outline • Paraphrasing in Main Conference • Paraphrasing in Workshop • Harvest in IJCNLP • Pities in IJCNLP • Conclusions

  41. Pities in IJCNLP • Poor English hearing block communication • Korean has poorer English • So many mosquito • Few beautiful girls 

  42. Outline • Paraphrasing in Main Conference • Paraphrasing in Workshop • Harvest in IJCNLP • Pities in IJCNLP • Conclusions

  43. Conclusions • Know many peoples • Wide one’s views • Exercise one’s self-confidence • Grasp the newest research direction • Enjoy taking part in international conference!

  44. Thanks!

More Related