1 / 14

Conversation Disentanglement in Sports Discourse

Conversation Disentanglement in Sports Discourse. Anthony Wong 6/01/11. Importance of Topic. What is conversation disentanglement? Clustering task, diving a transcript into a number of smaller, separate conversations Conversation disentanglement has a couple practical applications:

amelie
Download Presentation

Conversation Disentanglement in Sports Discourse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Conversation Disentanglement in Sports Discourse Anthony Wong 6/01/11

  2. Importance of Topic • What is conversation disentanglement? • Clustering task, diving a transcript into a number of smaller, separate conversations • Conversation disentanglement has a couple practical applications: • Summary generation • User-interface systems like automatic threading

  3. Basis of my Approach • Michael Elsner and Eugene Charniak (2008) • Uses lexical and non-lexical features to cluster different threads • Time between utterances, same speaker, number of shared words, “content” words

  4. Proposed Project Overview • Follow the methodology in Elsner and Charniak’s paper • Create and annotate a dataset of sports discourse • Use existing Elsner/Charniak model to provide a baseline classification results and see how well their model adapts to a different chat domain • Test out different feature combination to hopefully raise performance • ? – Compare results with Elsner/Charniak paper in some meaningful way

  5. Progress so far • Retrieve and prepare data • Annotate data set • Test existing model as is on my data set • Test out different feature combinations • *Evaluate model performance

  6. Retrieving and preparing data

  7. Retrieving and preparing data

  8. Annotating the data

  9. Annotating the data T1 715 KateC : Sam - this is going to be painful, isn't it? T1 715 SamHolako : I hope not Kate, but Howard, Nelson and Carter have killed the Raptors in the past T2 715 JaredWade : Classic Frisco. The Minnesota bathroom smells worse, I hear. T3 715 Anthony(RapsFan) : @Batman: His WP48 is the worst on the team. Andrea is terrible. He scores. That's about it. T3 715 Arnold : Holy impossibilities , Batman - that won't happen. T4 715 BretLaGree : Raja Bell and Mike Bibby just held a flop-off in the lane. Bell won. T5 715 Bobbo : Zach, Go hit up Cinnabun!!! worth the $$...write it off to ESPN anyway T5 715 ZachHarper : I don't think it works that way T6 715 Aras : Jared! T6 715 JaredWade : Aras.

  10. Annotating the data • The annotated part of this transcript has 399 lines. • 177 unique threads. • The average conversation length is 2.25423728814 . • The median conversation length is 2 . • The entropy is 7.0155726118 bits. • The median chat has 0.0 interruptions per line. • The average block of 10 contains 6.25706940874 threads. • The line-averaged conversation density is 2.77944862155 .

  11. Running Elsner model as is • T1 715 KateC : Sam - this is going to be painful, isn't it? • T2 715 SamHolako : I hope not Kate, but Howard, Nelson and Carter have killed the Raptors in the past • T3 715 JaredWade : Classic Frisco. The Minnesota bathroom smells worse, I hear. • T4 715 Anthony(RapsFan) : @Batman: His WP48 is the worst on the team. Andrea is terrible. He scores. That's about it. • T5 715 Arnold : Holy impossibilities , Batman - that won't happen. • T6 715 BretLaGree : Raja Bell and Mike Bibby just held a flop-off in the lane. Bell won. • T7 715 Bobbo : Zach, Go hit up Cinnabun!!! worth the $$...write it off to ESPN anyway • T8 715 ZachHarper : I don't think it works that way • T9 715 Aras : Jared! • T9 715 JaredWade : Aras.

  12. Running Elsner model as is • 368 unique threads. • The average conversation length is 1.08423913043 . • The median conversation length is 1 . • The entropy is 8.48485646504 bits. • The median chat has 0.0 interruptions per line. • The average block of 10 contains 9.52699228792 threads. • The line-averaged conversation density is 1.42355889724 .

  13. Editing the model and evaluation • Still in progress • A lot of room for improvement • Many different feature combinations to try • Need to get evaluation code running

  14. Issues • Documentation for Elsner code is good, but my Python is not • Integration issues between my data and Elsner code • MEGA Model Optimization Package (megam)

More Related