1 / 34

New Event Detection & Tracking

ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department Bilkent University. New Event Detection & Tracking. Outline. Introduction What is New event detection, tracking system Motivation Related Work TDT

beau
Download Presentation

New Event Detection & Tracking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department Bilkent University New Event Detection & Tracking

  2. Outline • Introduction • What is New event detection, tracking system • Motivation • Related Work • TDT • Google News • NewsInEssence • Proposed System • Test Collection Preparation(TTracker), • Novelty Detection & Event Tracking • C3M concept • Design Details • Future Work • Named Entities with NED • Conclusion First Event Detection & Event Tracking

  3. Introduction • Event • Time, space • Topic • Seminal event or activity • The differences • “Computer virus detected at Biritish Telecom, March 3, 1993 is an Event” • “Computer virus outbreaks” is a topic First Event Detection & Event Tracking

  4. Introduction • New event detection: is the task of detecting stories about previously unseen events in a stream of news stories. • Airplane crash, earthquake, governmental elections, and etc. • Properties of New Event • When the event occurred • Who was involved • Where it took place • How it happened • Impact, significance or consequence of the event First Event Detection & Event Tracking

  5. Introduction • Information filtering system • uses a long-lived profile of a user’s request to identify relevant material in a stream of arriving documents. • In contrast, new event detection has no knowledge of what events will happen in the news, so must operate without a pre-specified query. • NEDT usage areas • In categorization system • For people who need to know latest news, • govermental analyst, financial analyst, stock market traders • Identifying new mails from previous ones First Event Detection & Event Tracking

  6. Related Work • Topic Detection and Tracking (TDT) • Researching since 1997 • Broadcast news, written and spoken news stories in multiple languages • Research Area • Story Segmentation - Detect changes between topically cohesive sections • Topic Tracking - Keep track of stories similar to a set of example stories • Topic Detection - Build clusters of stories that discuss the same topic • First Story Detection - Detect if a story is the first story of a new, unknown topic • Link Detection - Detect whether or not two stories are topically linked First Event Detection & Event Tracking

  7. Related Work • Google News • A novel approach to News • Uses 4,500 English news sources worldwide • Groups similar stories together • Displays them according to each reader's personalized interests. First Event Detection & Event Tracking

  8. Related Work • NewsInEssence • Since 2001 • Summarizing clusters of related news articles from multiple sources on the Web. • Developed by the CLAIR group at the University of Michigan. • Being partially funded by the NSF under the ITR program, grant number ITR-0082884. First Event Detection & Event Tracking

  9. Proposed System • Handling of Test data (Milliyet, TRT, Zaman, Haber7, Cnnturk) • Distribution of the data among collections • Processing the raw data • Test Collection Preparation (TTracker) • Profiles and its properties • Sample profiles from collection • Novelty Detection & Event Tracking • C3M Concept • Algorithm details • Future Work • Named entities • System evaluation • Conclusion First Event Detection & Event Tracking

  10. Handling of Test Data • Data is collected from 5 different sources; • CNN Türk (http://www.cnnturk.com), • Haber 7 (http://www.haber7.com), • Milliyet Gazetesi (http://www.milliyet.com.tr) • TRT (http://www.trt.net.tr), • Zaman Gazetesi (http://www.zaman.com.tr). • From these sources news of 2005 are crawled which has time stamps (date and time). First Event Detection & Event Tracking

  11. Handling of Test Data • Each source is the representative of different angle of view; • CNN Türk – It is international, American style • TRT – It is governmental, more restrictive • Milliyet Gazetesi – It has modern perspective • Zaman Gazetesi – It is conservative • Haber 7 – It provides variety • Hence, different perspectives provides nice challenge while tracking the news. First Event Detection & Event Tracking

  12. News Source No. of News % Addition to Total News Avarage News Length (no. of words) CNN Türk 31,919 14.2 270.57 Haber 7 59,304 26.3 237.85 Milliyet Gazetesi 72,506 32.1 218.34 TRT 19,102 8.5 120.75 Zaman Gazetesi 42,749 19.0 96.76 All 225,580 100.0 199.56 Handling of Test Data • Statistics about sources; • After crawling the data, the text is cleaned from html tags by using HTMLParser library. First Event Detection & Event Tracking

  13. Test Collection Preparation TTracker • TTracker is a sub-component to collect the test and training data semi-automatically. • It is based on an information retrieval system. • This system is allowed define the profiles and its tracking news. • The system is also provides some statistical information about the profiles. • Success of the system will also be compared with manual tracking. First Event Detection & Event Tracking

  14. Test Collection Preparation TTracker • Profile contents as follows; • Topic Title: One or two word definition. • Seminal Event: Definition with at most two or three sentences. • What: What happened during the event. • Who: Who involved the event. • When: When the event occurs. • Where: Where the event occurs. • Topic Size: Estimated number of tracking news. • Seed: Seed document of the event. • Event Type: Category of the event. First Event Detection & Event Tracking

  15. Test Collection Preparation TTracker • Defining the tracking news in five stages; • Stage 1: Using seed document as a query. • Stage 2: Using event profile as a query. • Stage 3: Using tracking news as query. • Stage 4: Creative query searching. • Stage 5: Quality control of the profile. • After these stages are completed the quality of the profiles are also controlled by administrators. Start Create Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Finish First Event Detection & Event Tracking

  16. Test Collection Preparation TTracker • In the stages annotators has right to define the news as “tracking”, “non-tracking”, “not-sure”, “not-evaluated”. • Annotators are evaluating; • 200 documents for the 1st stage, • 300 documents for the 2nd stage, • 400 documents for the 3rd stage, • 200 documents each for the queries of 4th stage. First Event Detection & Event Tracking

  17. Retireved Tracking Non-Tracking Not-Sure Not-Evaluated Time-Spend Avg. 546 89 378 1 77 130 Min. 221 2 14 0 0 20 Max. 1129 454 761 37 614 825 Test Collection Preparation TTracker • Until now, we collect nearly 60 completed profile with valuable contrubiton of our friends. • We give extra importance not to occur bias in the collection. Number of profiles of a person, event types, profile lengths are all kept in balance. First Event Detection & Event Tracking

  18. Pro. No News Title No. of Tracking News Life-Tine (day) No. of Tracking News in n Days n=100 n=50 n=25 n=10 1 Sahte Rakı 329 304 298 273 244 185 2 Papa 2. Jean Paul, hastalığı ve ölümü 291 288 287 76 52 34 3 Suriye’yi Lübnan’dan çıkaran suikast 318 330 221 166 58 6 4 Kırgızistan’da kadifemsi “devrim” 179 270 166 147 138 110 5 Live 8 konserlerinin G-8 zirvesine etkisi 110 241 94 36 1 0 6 Fransa’nın AB anayasasını referanduma götürmesi 329 353 99 52 26 14 7 Özbekistan’da kanla bastırılan isyan 231 241 206 188 172 33 8 2005 Eurovision Şarkı Yarışması 94 279 53 32 16 1 9 Formula 1 Türkiye Grand Prix 141 308 35 15 8 4 Test Collection Preparation TTracker • Example profiles and their life-time statistics; First Event Detection & Event Tracking

  19. 2005 Eurovision Şarkı Yarışması Sahte Rakı 8 80 6 60 4 40 2 20 0 0 Test Collection Preparation TTracker • Distribution of news in the year for two sample profiles which are generated by using TTracker; News amount News amount Days of 2005 Days of 2005 First Event Detection & Event Tracking

  20. Test Collection Preparation TTracker • To prepare this system, we used information retrieval system – semi automatic; • TTracker’s recall value will be compared with the manual system recall value (=1). • By using T-test, correctness of the system would be measured. First Event Detection & Event Tracking

  21. Proposed System • Novelty Detection & Event Tracking Noveltydetection • the identification of new data that a machine learning system is not aware of during training. • one of the fundamental requirements of a good classification or identification system. First Event Detection & Event Tracking

  22. Window Old News First Event time 0   Tracking Events Now Proposed System • A special case of novelty detection... First Event Detection & Event Tracking

  23. Proposed System Cover Coefficient Based Clustering Methodology(C3M) [Can F., Ozkarahan E.1990] • Single pass seed algorithm • Working principles are: • Determining number of clusters • Determining cluster seeds • Assigning other documents to clusters initiated by seeds • Two stage probability experiment is performed First Event Detection & Event Tracking

  24. Proposed System • C3M CONCEPT • Example D(Document Term) and C(cover coefficient) matrixes • Cij=αi* ∑dIK*βK*dJK for k=1 to m First Event Detection & Event Tracking

  25. Proposed System • NEDT using C3M Concept: • Threshold value δW (for new event detection) depends: • Window size • Cii of incoming event • Cij of incoming event to other events in window • δG depends: • Cluster centroid similarity(CIJ) • Cii of incoming event First Event Detection & Event Tracking

  26. Proposed System • Two thresholds should be found: • In window • In collection • A possible selection for high in window but complicated and found by some experimental trials intuitionally... • Results are as follows: First Event Detection & Event Tracking

  27. Proposed System • Some experiments will be conducted to improve threshold according to: -Some pattern recognition techniques such as • Mixture of Gaussian • SVM • Decision Trees • Another problem about threshold finding: • dataset is not large enough • only 2 feature available Note: Blue dots: New Event Green dots: Tracking event X axis: Cii Y axis:Cij First Event Detection & Event Tracking

  28. Future Work • Improving NED => Using Named Entities • Topic-conditioned novelty detection (Yang, ..., 2002) • A new similarity measure with semantic classes (Makkonen, ..., 2002) • Modified similarity metrics (Kumaran and Allan, 2004) • Using names and topics (Kumaran and Allan, 2005) First Event Detection & Event Tracking

  29. Future Work • Intuition behind named entities: • Who, Where, When • People, organization, places, date and time • How to embed named entities into NED • A new similarity matrix • Additional similarity comparison with extracted named entities First Event Detection & Event Tracking

  30. Future Work • Evaluation of the NED • Judge documents • Select random documents from different categories • Annotators judge documents • Same documents are used by our system • Finally, evaluation is done according to precision and recall considering annotators’ judgements First Event Detection & Event Tracking

  31. Future Work • Developing an • effective • real-time • Web application capable of • detecting new events • tracking old ones First Event Detection & Event Tracking

  32. Conclusion • Mention about • New Event Detection and Tracking Concepts • Test collection preparation • Details of designed system • Goal: • Perform a leading research in Turkish • Make real of dreams in Information Retrival • “Rising like a sun in the science world” Fazli Can First Event Detection & Event Tracking

  33. References • Can F. and Ozkarahan, E. A. “Concepts and effectiveness of the cover coefficient based clustering methodology for text databases”. 1990. • Kumaran G. and Allan J. “Text classification and named entities for new event detection”. 2004. • Makkonen J., Ahonen-Myka H., and Salmenkivi M. “Appliying semantic classes in event detection and tracking”. 2002. • Yang Y., Zhang J., Carbonell J., and Jin C. “Topic-conditioned novelty detection”. 2002. First Event Detection & Event Tracking

  34. Questions? Thanks for your patience... Any questions? First Event Detection & Event Tracking

More Related