1 / 0

Making Conversation Structure Explicit: Identification of Initiation-Response Pairs within Discussion Forums

Making Conversation Structure Explicit: Identification of Initiation-Response Pairs within Discussion Forums. NAACL 2010. Yi- Chia Wang and Carolyn Rosé Language Technologies Institute School of Computer Science Carnegie Mellon University. 06/03/2010.

onofre
Download Presentation

Making Conversation Structure Explicit: Identification of Initiation-Response Pairs within Discussion Forums

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Conversation Structure Explicit:Identification of Initiation-Response Pairs within Discussion Forums

    NAACL 2010 Yi-Chia Wang and Carolyn Rosé Language Technologies Institute School of Computer Science Carnegie Mellon University 06/03/2010
  2. Discussion Forums and Thread Structure Often thread structure is explicitly represented Sometimes thread structure is implicit Initiation-response pairs are not necessarily adjacent to each other NAACL 2010
  3. Outline Related works Identification of Initiation-reply pairs as a ranking problem Usenet: data preparation Error Analysis for the purely lexical approach Variations of Latent Semantic Analysis Experimental results and current directions NAACL 2010
  4. Related Works Thread Recovery Application of thread recovery to education (Trausan-Matu et al., 2007) No evaluation Basic research in thread recovery (Wang et al., ICWSM 2008; Wang et al., CSCW 2008) Investigated the contribution of temporal information and similarity Conversation Disentanglement (Elsner and Charniak, 2008; Eisenstein and Barzilay 2008; Wang and Oard, 2009) Identify subtopic clusters of contributions in a conversation Did not identify the explicit parent-child relationships between contributions NAACL 2010
  5. Ranking Problem Kevin: What’s your plan for this weekend? The degree of relatedness between two contributions is conditioned on the relationship between them and other surrounding posts within the discussion. Ranking is more suitable than classification NAACL 2010
  6. Pairwise Ranking Given a pair of posts pi and pj, we represent the relatedness of them as the quantity xij xij = sim(pi, pj) Where sim is a similarity function Model Input: ordered pair (xij, xik) Scoring function: score(xij, xik) = xij – xik Output: sign(score(xij, xik)) +: xij is ranked higher than xik NAACL 2010
  7. Corpus - Usenet One of the most active newsgroups, alt.politics.usa From June 2003 through 2008 Parent-child relationships between whole posts are explicit in meta-data Statistics 784,708 posts 625,116 posts are explicit responses to others 77,985 discussion threads with 2 or more posts NAACL 2010
  8. Setting up the Data Set pk pj Negative example Positive example pi For every post pi, we generated one instance ( (pi, pj), (pi, pk) ) pi is the reply message pj is the correct parent message of pi pk is a randomly chosen incorrect parent message of pi Random post from the same thread of pi Not the parent of pi NAACL 2010
  9. Dividing the Data Set Three subsets Learning set (~90% of the data): 90,028 Used for constructing LSA space Testing set (~10% of the data): 10,000 Evaluation results. NAACL 2010
  10. Baseline Higher lexical cohesion is expected between the reply and the correct parent ( (pi, pj), (pi, pk) )  ( cossim(pi, pj), cossim(pi, pk) ) Accuracy is 66% NAACL 2010
  11. Error Analysis of Baseline Is there a way to amplify indirect connections? NAACL 2010
  12. Latent Semantic Analysis (LSA) Landauer et al., 1998 Term-by-document Matrix Term-by-concept Matrix LSA LSA can group semantic-related words together Represent word meanings in a concept space with dimension k Documents are the positive examples in the learning set NAACL 2010
  13. Versions of LSA( (pi, pj), (pi, pk) )  ( ?(pi, pj), ?(pi, pk) ) lsa-avg(pi, pj) Foltz et al., 1998 lsa-cart(pi, pj) Introduced in this work
  14. Experimental Results Two independent factors Text expansion with WordNet (or not) Relatedness representation Results Syn+Hyper, Gloss > NoExp LSA-cart > Cos > LSA-avg (LSA-cart, NoExp) > all the others NAACL 2010
  15. Conclusions and Future Work Formalize thread recovery as a ranking problem A detailed error analysis of a simple baseline A novel variation of LSA Future work LSA-cart has higher time complexity Take discourse focus (i.e. salience) into account to select the most informative word pairs Take discourse function (i.e. conversation act) of contributions into account Proposal, Counter-proposal, Acceptance, Rejection … NAACL 2010
  16. Question??
  17. More Details pj pi Narrow down to the specific text spans that have the initiation-reply relation. The text immediately following the quoted text tends to have an explicit discourse connection with it NAACL 2010
  18. Taking Advantage of Initiation-Response Relations Theoretical aspect What makes conversation coherent? How people are relating to each other? Practical aspect Influence prediction (Java et al., 2006; Kale et al., 2006) Newsgroup search (Xi et al., 2004) Meta features extracted from the discussion threads Text classification (Wang et al., 2007) Email summarization (Carenini et al., 2007) Quotation graph to organize conversations NAACL 2010
More Related