1 / 39

Connecting the Dots Between News Article

Connecting the Dots Between News Article. KDD‘10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh. Outline. Introduction Scoring a chain Formalize story coherence Measuring influence Finding a good chain Evaluation Interaction Model. Introduction.

Download Presentation

Connecting the Dots Between News Article

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Connecting the Dots Between News Article KDD‘10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh

  2. Outline • Introduction • Scoring a chain • Formalize story coherence • Measuring influence • Finding a good chain • Evaluation • Interaction Model

  3. Introduction • Users are constantly struggling to keep up with the large amounts of content that is being published every day. • With this much data, it is easy to miss the big picture. • Investigate methods for automatically connecting the dots.

  4. Connecting the mortgage crisis to healthcare • This chain should be coherent • The user should gain better understanding of the progression of the story

  5. Scoring a chain

  6. Formalizing story coherence

  7. Formalizing story coherence

  8. Formalizing story coherence • Advantage: - Positioning similar documents next to each other - Rewards long stretches of words • Disadvantage: - Overlook importance of a word - Missing Words - Overlook weak links

  9. Formalizing story coherence

  10. Formalizing story coherence

  11. Formalizing story coherence

  12. Formalizing story coherence

  13. Formalizing story coherence • Jitteriness: topics that appear and disappear throughout the chain - Only consider the longest continuous stretch of each word. - This way, going back-and-forth between two topics provides no utility after the first topic switch

  14. Formalizing story coherence

  15. Measuring influence

  16. Measuring influence

  17. Measuring influence

  18. Measuring influence

  19. Measuring influence

  20. Finding a good chain

  21. Finding a good chain • Linear Programming - Chain Restriction - Smoothness - Activation Restriction - Minmax Objective

  22. Linear Programming

  23. Linear Programming

  24. Linear Programming

  25. Linear Programming • Minmax Objective - Minedge is the minimum of all active edge scores

  26. Evaluation • More than half million real news articles were used. • Major news stories of recent years are considered. • For each story, selecting an initial subset of 500 – 10,000 candidate articles, based on keyword-search • Named entities and noun phrases were extracted from each article(remove infrequent name entities and non-informative noun phrase)

  27. Evaluation • Stories linking technique - Connecting-Dots - Shortest-path - Google News Timeline(GNT) - Event threading(TDT)

  28. Evaluation • Shortest path constructed a graph by connecting each document with its nearest neighbor based on Cosine similarity • Google news timeline GNT - Using query string to get articles - Construct query string for each story, based on s and t - Picked K equally-spaced documents between the dates of the original query article

  29. Evaluation

  30. Evaluation • 18 users with a pair of source an target articles • Gauged users familiarity with those articles • Ask whether they believe they knew a coherent story linking them together( on scale 1 - 5 ) • Ask user to indicate - Relevance - Coherence - Non-Redundancy

  31. Evaluation

  32. Evaluation

  33. Interaction Models • Refinement: - Users might be especially interested in a specific part of the chain - A refinement may consist of adding a new article, or replacing an article

  34. Interaction Models

  35. Interaction Model

  36. Evaluation • Refinement - Return two chains, obtained from the original chain by (1) our local search (2) adding an article chosen randomly from a subset of candidate articles - User preferred the local-search chains 72% of the time

  37. Evaluation • User Interests - Two chains are showed to users 1 Obtained from the other by increasing the importance of 2-3 words 2 Show them a list of ten words containing the words (1) words whose importance we increased (2) randomly chosen words asked which words they would pick in order to obtain the seconds chain from the first. The goal was to see if users can identify at least some of the words - User identified at least one word 63.3% of the time

  38. Conclusion & Future Work • Describe problem of connecting the dots. • Explore different desired properties of a good story, formalized it as a linear program • Provided an efficient algorithm to connect two articles • Allowing more complex tasks

More Related