Making Conversation Structure Explicit: Identification of Initiation-Response Pairs within Discussion Forums

Making Conversation Structure Explicit:Identification of Initiation-Response Pairs within Discussion Forums
NAACL 2010 Yi-Chia Wang and Carolyn Rosé Language Technologies Institute School of Computer Science Carnegie Mellon University 06/03/2010

Discussion Forums and Thread Structure Often thread structure is explicitly represented Sometimes thread structure is implicit Initiation-response pairs are not necessarily adjacent to each other NAACL 2010

Outline Related works Identification of Initiation-reply pairs as a ranking problem Usenet: data preparation Error Analysis for the purely lexical approach Variations of Latent Semantic Analysis Experimental results and current directions NAACL 2010

Related Works Thread Recovery Application of thread recovery to education (Trausan-Matu et al., 2007) No evaluation Basic research in thread recovery (Wang et al., ICWSM 2008; Wang et al., CSCW 2008) Investigated the contribution of temporal information and similarity Conversation Disentanglement (Elsner and Charniak, 2008; Eisenstein and Barzilay 2008; Wang and Oard, 2009) Identify subtopic clusters of contributions in a conversation Did not identify the explicit parent-child relationships between contributions NAACL 2010

Ranking Problem Kevin: What’s your plan for this weekend? The degree of relatedness between two contributions is conditioned on the relationship between them and other surrounding posts within the discussion. Ranking is more suitable than classification NAACL 2010

Pairwise Ranking Given a pair of posts pi and pj, we represent the relatedness of them as the quantity xij xij = sim(pi, pj) Where sim is a similarity function Model Input: ordered pair (xij, xik) Scoring function: score(xij, xik) = xij – xik Output: sign(score(xij, xik)) +: xij is ranked higher than xik NAACL 2010

Corpus - Usenet One of the most active newsgroups, alt.politics.usa From June 2003 through 2008 Parent-child relationships between whole posts are explicit in meta-data Statistics 784,708 posts 625,116 posts are explicit responses to others 77,985 discussion threads with 2 or more posts NAACL 2010

Setting up the Data Set pk pj Negative example Positive example pi For every post pi, we generated one instance ( (pi, pj), (pi, pk) ) pi is the reply message pj is the correct parent message of pi pk is a randomly chosen incorrect parent message of pi Random post from the same thread of pi Not the parent of pi NAACL 2010

Dividing the Data Set Three subsets Learning set (~90% of the data): 90,028 Used for constructing LSA space Testing set (~10% of the data): 10,000 Evaluation results. NAACL 2010

Baseline Higher lexical cohesion is expected between the reply and the correct parent ( (pi, pj), (pi, pk) )  ( cossim(pi, pj), cossim(pi, pk) ) Accuracy is 66% NAACL 2010

Error Analysis of Baseline Is there a way to amplify indirect connections? NAACL 2010

Latent Semantic Analysis (LSA) Landauer et al., 1998 Term-by-document Matrix Term-by-concept Matrix LSA LSA can group semantic-related words together Represent word meanings in a concept space with dimension k Documents are the positive examples in the learning set NAACL 2010

Versions of LSA( (pi, pj), (pi, pk) )  ( ?(pi, pj), ?(pi, pk) ) lsa-avg(pi, pj) Foltz et al., 1998 lsa-cart(pi, pj) Introduced in this work

Experimental Results Two independent factors Text expansion with WordNet (or not) Relatedness representation Results Syn+Hyper, Gloss > NoExp LSA-cart > Cos > LSA-avg (LSA-cart, NoExp) > all the others NAACL 2010

Conclusions and Future Work Formalize thread recovery as a ranking problem A detailed error analysis of a simple baseline A novel variation of LSA Future work LSA-cart has higher time complexity Take discourse focus (i.e. salience) into account to select the most informative word pairs Take discourse function (i.e. conversation act) of contributions into account Proposal, Counter-proposal, Acceptance, Rejection … NAACL 2010

Question??

More Details pj pi Narrow down to the specific text spans that have the initiation-reply relation. The text immediately following the quoted text tends to have an explicit discourse connection with it NAACL 2010

Taking Advantage of Initiation-Response Relations Theoretical aspect What makes conversation coherent? How people are relating to each other? Practical aspect Influence prediction (Java et al., 2006; Kale et al., 2006) Newsgroup search (Xi et al., 2004) Meta features extracted from the discussion threads Text classification (Wang et al., 2007) Email summarization (Carenini et al., 2007) Quotation graph to organize conversations NAACL 2010

Making Conversation Structure Explicit: Identification of Initiation-Response Pairs within Discussion Forums

Making Conversation Structure Explicit: Identification of Initiation-Response Pairs within Discussion Forums

Presentation Transcript

Making Conversation Structure Explicit:Identification of Initiation-Response Pairs within Discussion Forums

Molecular Geometry (p. 232 – 236)

Mites: Identification and management

NFPA 704 Hazard Identification System Introduction to Section 704 Hazard Identification System of the National Fire Prot

Vowel Pairs oa and ow

Initiation and Monitoring of Therapy

Initiation à la traductologie

Initiation à la comptabilité générale

N106 Nursing Care of the Newborn

Biomechanics of Gait Initiation and Termination

Campus Response to Terrorism

EMD

RNA: Secondary Structure Prediction and Analysis

Insight on Property Valuation Part #2

General Entomology and Insect Identification

CUI 4450 Education and Psychology of Exceptional Children

Initiation aux technologies de l’information

Response to Intervention III SW Behavioral Assessment

The initiation of yeast DNA replication Questions? Contact michael.weinreich@vai

Foundation Objectives The Haz Mat Response Process

SPECIAL EDUCATION DECISION MAKING: RESPONSE TO INTERVENTION