240 likes | 337 Views
Discriminative Dialog Analysis Using a Massive Collection of BBS comments. Bulletin Board Systems Why BBS? ← |BBS|>> What sort of text?. |News Wired| |Wikipedia|. Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei MURAKAMI (NICT)
E N D
Discriminative Dialog AnalysisUsing a Massive Collection of BBS comments Bulletin Board Systems Why BBS? ← |BBS|>> What sort of text? |News Wired| |Wikipedia| Eiji ARAMAKI (University of Tokyo) Takeshi ABEKAWA (University of Tokyo) Yohei MURAKAMI (NICT) Akiyo NADAMOTO (NICT) Japan
ID name What is the most light or small mp3 player? iPod Shuffle is the best way to do? Not Reply please tell me why my nano sometimes stops even battery still remains. How about iriver N12? extremely light and small. Reply It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left. iriver N series has stopped producing.
ID name What is the most light or small mp3 player? iPod Shuffle is the best way to do? please tell me why my nano sometimes stops even battery still remains. “N12” is a “small and light” “MP3 player”, but now “has stopped producing” How about iriver N12? extremely light and small. Reply It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left. BUT: NLP suffers from gaps between corresponding comments iriver N series has stopped producing. Reply
How Often Such Gaps?Gap length (distance) & Frequency No gap (distance=1) is only 50% Usually distance =2~5 Gap is a popular phenomenon 【QUESTION】 Despite gaps, how does a human-being capture REPLY-TO relations
Linguistic already gave several answers • One of answers is Relevance theory [Sperber1986] Human communication is based on relevance Not enough! How to calculate relevance? Linguist Computer Scientist 【This study’s GOAL】 To formalize relevance
Outline • Background • Method • Task setting / Our Approach • How to formalize two types of relevance • Experiment • Related Works • Conclusion
Task-setting • Natural Task-setting = To which a comment reply-to? i-3 → Complex task i-2 i-1 i th True or False P Our Task-setting • INSTEAD: Discriminative Task • Input: Two comments in the same BBS (P & Q) • Output: True (=Q is reply-to P) / False • → Suitable to Machine learning (such as SVM) Q
Our Approach/Assumption • 2 types of relevance are available (1) Contents Relevance Roughly speaking: sentence similarity What is the most light or smallmp3 player? How about iriver N12? extremely light and small. (2) Discourse Relevance Discourse or function of comments please tell me why my nano sometimes stops … WHY-QUESTION It is because battery display approaches … REASON
Outline • Background • Method • Task setting / Our Approach • How to formalize two types of relevance • (1) Contents Relevance • (2) Discourse Relevance • Experiment • Related Works • Conclusion
Two Contents Relevance • (1) Word Overlap Ratio = 4/12= 0.33 2 3 4 6 1 5 What is the most light or smallmp3 player? How about iriver N12? extremely light and small. 2 4 6 1 3 5 ! Simple Word overlap Ratio can not capture mp3 player iriver N12 • (2) WebPMI based Sentence Similarity • WebPMI[Bollegala2007] is defined by ↓
# of web pages that contain “N12”&“MP3” Web-PMI Mutual information of two wordsin WEB pages H(p∩q) / N WEBPMI (p,q)=log H(p) / N・H(q)/N # of web pages that contains “N12” # of web pages that contains “MP3” Content Relevance For each word in P, search Q’s word with the highest WebPMI, and sum up their values
Outline • Background • Method • Task setting • How to formalize two types of relevance • (1) Contents Relevance • (2) Discourse Relevance • Experiment • Related Works • Conclusion
Discourse Relevance (CMPI; Corresponding PMI ←newly proposed) • ALSO: PMI-based measure • BUT: Count co- occurring phrases in P and Q please tell me why mynano sometimes stops … It is because battery display approaches … # of P-Q pairs that contain “please tell me why” in P “It is because” in Q H(p∩q) / N CPMI (p,q)=log H(p) / N・H(q)/N # of P that contain “please tell me why” # of Q that contain “It is because ” ! To calculate PMI, we need a large set of P & Q pairs To obtain one comment-pair, we need a large set of comment-pairs
Building a collection of P & Q pairs, by using Lexical-patterns • Sometimes (=5.1%), we can easily know a response target by using lexical clues (NAME or COMMNET-ID) Known 5.1% 100 It’s my first comment! Nice to meet you. Unknown 102 100> nice to meet you.. • Of COURSE: 5.1% is low ratio • OUR SOLUTION: We rely on the data scale (17,300,000 comments) → enough amount for PHIcalculation
Outline • Background • Method • Experiment • Related Works • Conclusion
Half is positive (extracted by patterns), the other is random pairs Experiment 1 • TEST-SET: 140 comment pairs (140 P-Q pairs) • TASK: output Q reply-to P or not • METHODS: • Human-A,B,C • Overlap: Only overlap ratio • WEBPMI: Only Contents Relevance • CPMI: Only Discourse Relevance • SVM: Feature= IF ratio > Th Then TRUE Else FALSE IF PMI > Th Then TRUE Else FALSE VALUE: WEBPMI & CPMI LEXICON: WORDS∈P, Q
Result Summary Accuracy precision recall Fβ=1 Human-A 79.2 83.3 75.3 79.1 > Human-B 75.7 78.2 73.9 76.0 = 70-79% Human-C 70.7 71.6 72.6 72.1 OVERLAP ≒ WEBPMI < SVM < CPMI Discourse is strong OVERLAP 61.4 58.7 87.6 70.3 WEBPMI 61.4 72.0 42.4 53.4 CPMI 65.7 66.2 69.8 67.9 Feature design is not suitable? SVM 63.8 64.4 79.4 72.1
Kappa Matrix • Agreement between methods Human-B Human-C OVERLAP WEBPMI CPMI Human-A 0.56 0.49 0.08 0.20 0.28 WEBPMI & CPMI have low agreement → They succeed in different examples, This supports our assumption, which decompose relevance into two: (1) contents & (2) discourse Human-A 0.47 0.09 0.21 0.25 Human-C 0.15 0.05 0.25 OVERLAP 0.21 0.13 Human output is similar to each other WEBPMI 0.16 :=Moderate (High) :=Slight(Low)
Several Examples of phrase pairs that have high CPHI values Event sequence P says “go” & Q says “wait” PHI P Q 8.43 I’d like to go Wait for you 8.37 Where is it … It is in/at These are outside the reach of sentence similarity, motivating discourse clues 7.62 Please tell me… I think it is … 7.47 How about… as soon as possible 7.38 You can … I try … 7.12 I think … Thank you 6.93 …, isn’t it ? Maybe ANSWER and THANKING 6.80 Thank you Your welcome 6.72 I … I…too
Outline • Background • Method • Experiment • Related Works (if enough time left) • Conclusion
Related Works (1/2)in Linguistics • 4 conversational maxims [Grice1975] • Relevance theory [Sperber1986] How to calculate maxim/relevance? We’ve formalized it! • Adjacency Pair[Schegloff&Sacks1973] Which is a sequence of two utterances (such as “offers-acceptance”) In BBSs, adjacency pairs are not adjacent This motivates our task
Related Works (2/2)in NLP • Previous (Dialog and Discourse) Studies • Such as • Based on carefully annotated corpus Rich set of labels/relations • This Study • Only one relation (REPLY-TO relation) • BUT: not require human annotation → large scale → enable to calculate Statistical Values (PMI) DAMSL [Core&Allen1997] RST-DT [Carlson2002] Discourse Graph-Bank [Wolf2005]
Outline • Background • Method • Experiment • Related Works • Conclusion
Conclusion • (1) NEW_TASK • To Detect REPLY-TO relation in comments • (2) Formalization for Relevance • To solve the task: We formalize two relevance CONTENTS & DISCOURSErelevance • (3) Automatic Corpus Building • To calculate DISCOURSE relevance, we also proposed pattern based corpus construction FINALLY: We believe this study will boost larger scale dialog study (using WEB)