1 / 13

Topic Detection and Tracking

Topic Detection and Tracking. Introduction and Overview. 5 R&D Challenges : Story Segmentation Topic Tracking Topic Detection First-Story Detection Link Detection. TDT3 Corpus Characteristics : † Two Types of Sources: Text • Speech Two Languages: English 30,000 stories

qabil
Download Presentation

Topic Detection and Tracking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic Detection and Tracking Introduction and Overview

  2. 5 R&D Challenges: Story Segmentation Topic Tracking Topic Detection First-Story Detection Link Detection TDT3 Corpus Characteristics:† Two Types of Sources: Text• Speech Two Languages: English 30,000 stories Mandarin 10,000 stories 11 Different Sources: _8 English__3 MandarinABC CNN VOAPRI VOA XINNBC MNB ZBNAPW NYT TDT Task Overview* * see www.nist.gov/speech/tests/tdt/tdt2000 for details † see www.ldc.upenn.edu/Projects/TDT3/ for details

  3. The TDT Tasks: Story Segmentation Topic Detection First-Story Detection Link Detection No equivalent task in TREC • Topic Tracking Similar to Ad hoc and Filtering, but… TDT – Similarities with TREC • Query versus Topic • Multiple streams • Multiple languages • Limited look-ahead • No supervised update

  4. Preliminaries A topicis … a seminal event or activity, along with alldirectly related events and activities. A storyis … a topically cohesive segment of news that includes two or more DECLARATIVE independent clauses about a single event.

  5. Example Topic Title: Mountain Hikers Lost • WHAT: 35 or 40 young Mountain Hikers were lost in an avalanche in France around the 20th of January. • WHERE: Orres, France • WHEN: January 1998 • RULES OF INTERPRETATION: 5. Accidents

  6. Topic Contrast – Year 2000 vs 1999

  7. Transcription: text (words) (for Radio and TV only) Story: Non-story: The Segmentation Task: To segment the source stream into its constituent stories, for all audio sources.

  8. on-topic unknown unknown Stories tagged asbeing “on topic” training data test data Untagged stories are notguaranteed to be off-topic The Topic Tracking Task: To detect stories that discuss the target topic,in multiple source streams. • Find all the stories that discuss a given target topic • Training: Given a stream of sample stories, with Nt of them known to discuss a given target topic, • Test: Find all subsequent stories that discuss the target topic.

  9. Two Primary Test Conditionsfor the Topic Tracking Task: • One on-topic training story ü • Four on-topic training stories ü • And two off-topic stories that are algorithmically very close to the topicbut that are certified as being “off-topic”. ?

  10. Unsupervised topic trainingA meta-definition of topic is required -independent of topic specifics. • New topics must be detected as the incoming stories are processed. • Input stories are then associated with one of the topics. a topic! The Topic Detection Task: To detect topics in terms of the (clusters of) storiesthat discuss them.

  11. First Stories Time = Topic 1 = Topic 2 Not First Stories The First-Story Detection Task: To detect the first story that discusses a topic, for all topics. • First-Story Detection is essentially the same as Topic Detection. The difference is only in what the system outputs.

  12. same topic? • The topic discussed is a free variable. • Topic definition and annotation is unnecessary. • The link detection task represents a basic functionality, needed to support all applications (including the TDT applications of topic detection and tracking). • The link detection task is related to the topic tracking task, with Nt = 1. The Link Detection Task To detect whether a pair of stories discuss the same topic.

  13. TDT3 Evaluation Methodology • All TDT3 tasks are cast as statistical detection (yes-no) tasks. • Story Segmentation: Is there a story boundary here? • Topic Tracking: Is this story on the given topic? • Topic Detection: Is this story in the correct topic-clustered set? • First-story Detection: Is this the first story on a topic? • Link Detection: Do these two stories discuss the same topic? • Performance is measured in terms of detection cost, which is a weighted sum of miss and false alarm probabilities:CDet = CMiss • PMiss • Ptarget + CFA • PFA • (1- Ptarget) • Detection Cost is normalized to lie between 0 and 1:(CDet)Norm = CDet/ min{CMiss • Ptarget, CFA • (1- Ptarget)}

More Related