1 / 24

Discriminative Dialog Analysis Using a Massive Collection of BBS comments

Discriminative Dialog Analysis Using a Massive Collection of BBS comments. Bulletin Board Systems Why BBS? ← |BBS|>> What sort of text?. |News Wired| |Wikipedia|. Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei MURAKAMI (NICT)

lamis
Download Presentation

Discriminative Dialog Analysis Using a Massive Collection of BBS comments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discriminative Dialog AnalysisUsing a Massive Collection of BBS comments Bulletin Board Systems Why BBS? ← |BBS|>> What sort of text? |News Wired| |Wikipedia| Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei MURAKAMI (NICT) Akiyo NADAMOTO (NICT) Japan

  2. ID name What is the most light or small mp3 player? iPod Shuffle is the best way to do? Not Reply please tell me why my nano sometimes stops even battery still remains. How about iriver N12? extremely light and small. Reply It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left. iriver N series has stopped producing.

  3. ID name What is the most light or small mp3 player? iPod Shuffle is the best way to do? please tell me why my nano sometimes stops even battery still remains. “N12” is a “small and light” “MP3 player”, but now “has stopped producing” How about iriver N12? extremely light and small. Reply It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left. BUT: NLP suffers from gaps between corresponding comments iriver N series has stopped producing. Reply

  4. How Often Such Gaps?Gap length (distance) & Frequency No gap (distance=1) is only 50% Usually distance =2~5 Gap is a popular phenomenon 【QUESTION】 Despite gaps, how does a human-being capture REPLY-TO relations

  5. Linguistic already gave several answers • One of answers is Relevance theory [Sperber1986] Human communication is based on relevance Not enough! How to calculate relevance? Linguist Computer Scientist 【This study’s GOAL】 To formalize relevance

  6. Outline • Background • Method • Task setting / Our Approach • How to formalize two types of relevance • Experiment • Related Works • Conclusion

  7. Task-setting • Natural Task-setting = To which a comment reply-to? i-3 → Complex task i-2 i-1 i th True or False P Our Task-setting • INSTEAD: Discriminative Task • Input: Two comments in the same BBS (P & Q) • Output: True (=Q is reply-to P) / False • → Suitable to Machine learning (such as SVM) Q

  8. Our Approach/Assumption • 2 types of relevance are available (1) Contents Relevance Roughly speaking: sentence similarity What is the most light or smallmp3 player? How about iriver N12? extremely light and small. (2) Discourse Relevance Discourse or function of comments please tell me why my nano sometimes stops … WHY-QUESTION It is because battery display approaches … REASON

  9. Outline • Background • Method • Task setting / Our Approach • How to formalize two types of relevance • (1) Contents Relevance • (2) Discourse Relevance • Experiment • Related Works • Conclusion

  10. Two Contents Relevance • (1) Word Overlap Ratio = 4/12= 0.33 2 3 4 6 1 5 What is the most light or smallmp3 player? How about iriver N12? extremely light and small. 2 4 6 1 3 5 ! Simple Word overlap Ratio can not capture mp3 player iriver N12 • (2) WebPMI based Sentence Similarity • WebPMI[Bollegala2007] is defined by ↓

  11. # of web pages that contain “N12”&“MP3” Web-PMI Mutual information of two wordsin WEB pages H(p∩q) / N WEBPMI (p,q)=log H(p) / N・H(q)/N # of web pages that contains “N12” # of web pages that contains “MP3” Content Relevance For each word in P, search Q’s word with the highest WebPMI, and sum up their values

  12. Outline • Background • Method • Task setting • How to formalize two types of relevance • (1) Contents Relevance • (2) Discourse Relevance • Experiment • Related Works • Conclusion

  13. Discourse Relevance (CMPI; Corresponding PMI ←newly proposed) • ALSO: PMI-based measure • BUT: Count co- occurring phrases in P and Q please tell me why mynano sometimes stops … It is because battery display approaches … # of P-Q pairs that contain “please tell me why” in P “It is because” in Q H(p∩q) / N CPMI (p,q)=log H(p) / N・H(q)/N # of P that contain “please tell me why” # of Q that contain “It is because ” ! To calculate PMI, we need a large set of P & Q pairs To obtain one comment-pair, we need a large set of comment-pairs

  14. Building a collection of P & Q pairs, by using Lexical-patterns • Sometimes (=5.1%), we can easily know a response target by using lexical clues (NAME or COMMNET-ID) Known 5.1% 100 It’s my first comment! Nice to meet you. Unknown 102 100> nice to meet you.. • Of COURSE: 5.1% is low ratio • OUR SOLUTION: We rely on the data scale (17,300,000 comments) → enough amount for PHIcalculation

  15. Outline • Background • Method • Experiment • Related Works • Conclusion

  16. Half is positive (extracted by patterns), the other is random pairs Experiment 1 • TEST-SET: 140 comment pairs (140 P-Q pairs) • TASK: output Q reply-to P or not • METHODS: • Human-A,B,C • Overlap: Only overlap ratio • WEBPMI: Only Contents Relevance • CPMI: Only Discourse Relevance • SVM: Feature= IF ratio > Th Then TRUE Else FALSE IF PMI > Th Then TRUE Else FALSE VALUE: WEBPMI & CPMI LEXICON: WORDS∈P, Q

  17. Result Summary Accuracy precision recall Fβ=1 Human-A 79.2 83.3 75.3 79.1 > Human-B 75.7 78.2 73.9 76.0 = 70-79% Human-C 70.7 71.6 72.6 72.1 OVERLAP ≒ WEBPMI < SVM < CPMI Discourse is strong OVERLAP 61.4 58.7 87.6 70.3 WEBPMI 61.4 72.0 42.4 53.4 CPMI 65.7 66.2 69.8 67.9 Feature design is not suitable? SVM 63.8 64.4 79.4 72.1

  18. Kappa Matrix • Agreement between methods Human-B Human-C OVERLAP WEBPMI CPMI Human-A 0.56 0.49 0.08 0.20 0.28 WEBPMI & CPMI have low agreement → They succeed in different examples, This supports our assumption, which decompose relevance into two: (1) contents & (2) discourse Human-A 0.47 0.09 0.21 0.25 Human-C 0.15 0.05 0.25 OVERLAP 0.21 0.13 Human output is similar to each other WEBPMI 0.16 :=Moderate (High) :=Slight(Low)

  19. Several Examples of phrase pairs that have high CPHI values Event sequence P says “go” & Q says “wait” PHI P Q 8.43 I’d like to go Wait for you 8.37 Where is it … It is in/at These are outside the reach of sentence similarity, motivating discourse clues 7.62 Please tell me… I think it is … 7.47 How about… as soon as possible 7.38 You can … I try … 7.12 I think … Thank you 6.93 …, isn’t it ? Maybe ANSWER and THANKING 6.80 Thank you Your welcome 6.72 I … I…too

  20. Outline • Background • Method • Experiment • Related Works (if enough time left) • Conclusion

  21. Related Works (1/2)in Linguistics • 4 conversational maxims [Grice1975] • Relevance theory [Sperber1986] How to calculate maxim/relevance? We’ve formalized it!  • Adjacency Pair[Schegloff&Sacks1973] Which is a sequence of two utterances (such as “offers-acceptance”) In BBSs, adjacency pairs are not adjacent This motivates our task 

  22. Related Works (2/2)in NLP • Previous (Dialog and Discourse) Studies • Such as • Based on carefully annotated corpus Rich set of labels/relations • This Study • Only one relation (REPLY-TO relation) • BUT: not require human annotation → large scale → enable to calculate Statistical Values (PMI) DAMSL [Core&Allen1997] RST-DT [Carlson2002] Discourse Graph-Bank [Wolf2005]

  23. Outline • Background • Method • Experiment • Related Works • Conclusion

  24. Conclusion • (1) NEW_TASK • To Detect REPLY-TO relation in comments • (2) Formalization for Relevance • To solve the task: We formalize two relevance CONTENTS & DISCOURSErelevance • (3) Automatic Corpus Building • To calculate DISCOURSE relevance, we also proposed pattern based corpus construction FINALLY: We believe this study will boost larger scale dialog study (using WEB)

More Related