220 likes | 444 Views
Data-driven Paraphrasing. Paraphrases. Pair of sequences of words, both in the same language, that have the same meaning in at least some contexts . I spilled the beans and told Jacky I loved her. I exposed my secret and told Jacky I loved her. Beijing’s policy toward Taiwan.
E N D
Data-driven Paraphrasing
Paraphrases • Pair of sequences of words, both in the same language, that have the same meaning in at least some contexts I spilled the beans and told Jacky I loved her I exposed my secret and told Jacky I loved her Beijing’s policy toward Taiwan China’s policy toward Taiwan
Paraphrase levels Word level (synonym) Phrase (sub-sentential) Sentence househome I spilled the beans I exposed my secret Beijing’s policy toward Taiwan remains unchanged China did not change its policy toward Taiwan
Paraphrase types • Structural paraphrases (She ate the applevs. The apple was eaten by her) • Lexical paraphrases (My horse galloped away vs. My mount galloped away) • Phrasal paraphrases(I don’t have enough money to buy this yachtvs. I can’t afford this yacht) • Idiomatic paraphrases (Ispilled the beans paired vs. I exposed the secret) • Referential paraphrases(Tuesdayvs. The day before Wednesday)
Textual entailment • The meaning of a target textual assertion (hypothesis, H) is inferred from a given text (T)? • Paraphrase is a special case of textual entailment, where each sequence entails the other TH Fire bombs were thrown at the Tunisian embassy in Bern The Tunisian embassy in Switzerland was attacked Borrowed from Mirkin, MTML 2011 T H
Applications • Machine translation • Question answering • Information retrieval • …
Paraphrasing techniques • Two main dimensions corpus type paraphrase level
Corpora types • Monolingual corpus • Monolingual parallel corpus • Monolingual corpus of comparable documents • Bilingual parallel corpus • Bilingual corpus of comparable documents
Corpus-based translation أنا هنا اليوم لأشارككم رحلة غير عادية -- رحلة غير عادية مجزية، في الواقع -- التي جعلتني ادرب الجرذان لإنقاذ حياة الناس عن طريق الكشف عن الألغام الأرضية والسل. عندما كان طفلا، كنت مولع بشيئين. كان أحدهما القوارض. كان عندي جميع أنواع القوارض، الفئران ، الهامستر، الجرابيع، السناجب. سمها ما شئت، اربيها ، وأبيعها لمحلات بيع الحيوانات الأليفة. )ضحك(كما كان لي شغف بأفريقيا. نشأت في بيئة متعددة الثقافات، كان لدينا طلبة أفارقة في المنزل ، وتعلمت قصصهم، [مثل] خلفيات مختلفة، الاعتماد على الدراية المستوردة، السلع والخدمات، التنوع الثقافي الغزير. كانت رائعة حقا أفريقيا بالنسبة لي… אני כאן בכדי לחלוק איתכם במסע מדהים -- במסע מדהים ומתגמל, למען האמת אשר הוביל אותי לאמן חולדות להצלת חיים באמצעות גילוי של מוקשים וגילוי של שחפת. כילד, היו שני נושאים שהלהיבו אותי אחד היה אהבה למכרסמים היו לי סוגים שונים של חולדות עכברים, אוגרים גרבילים, סנאים תנקבו בשם של מכרסם, אני גידלתי אותו, ומכרתי אותם לחנויות חיות מחמד. )צחוק(הייתה לי גם משיכה לגבי אפריקה גדלנו בסביבה רב תרבותית, והיו לנו סטודנטים אפריקאים בבית, ואני למדתי מהסיפורים שלהם [כגון] הרקעים השונים שמהם באו, תלות בידע מיובא, טובין, שירותים, רב-תרבותית חיונית. אפריקה באמת ריתקה אותי… Bi-lingual texts
Bannard & Callison-Burch (2005) what is more, the relevant cost dynamic is completely under control we owe it to the taxpayers to keep the costs in check im übrigen ist die diesbezüglichekostenentwicklung völlig unter kontrolle wir sind es den steuerzahlern schuldig die kosten unter kontrolle zu haben English Spanish French
Bannard & Callison-Burch (2005) - Pivoting • Paraphrase score for is given by: • With calculated by a maximum likelihood settings, e.g.: Results: ~70% correct (over 289 tested phrases)
Bannard & Callison-Burch (2005) - Pivoting • Performance depends on the pivoting language • For example: English => Arabic (Madnani and Dorr, 2010) generates manay paraphrases of different inflected forms(e.g., caused clouds vs. causing clouds)
Modified Arabic paraphrase definition (our work) We include among paraphrases pairs of phrases that express the same meaning, regardless of their inflection for number, gender, and person And we show that they improve Arabic => English machine translation Bar and Dershowitz in CICLING 2014
Paraphrase patterns • Semantically equivalent patterns; a pattern generally contains two parts: words and slots X solves Y Y is solved by X X finds a solution to Y
Zhao et al. (2008) – Pivoting for paraphrase patterns • Use pivoting on dependency-parsed corpora • Extract patterns by treating complete paths as variables NN
Zhao et al. (2008) – Pivoting for paraphrase patterns • Align with the pivot target-language pattern (in their original work: Chinese)
Zhao et al. (2008) – Pivoting for paraphrase patterns • 5 types of extracted patterns
Marton et al. (2009)–distributional similarity • Large monolingual corpus • Using cosine similarity of the distributional profile (DP)of the candidate phrases English Spanish Chinese