290 likes | 447 Views
Mention- a nomaly - based E vent D etection and T racking in T witter Adrien Guille & Cécile Favre ERIC Lab , University of Lyon 2, France. IEEE/ACM ASONAM 2014, Beijing, China. What is Twitter & why study it ?. Twitter : micro- blogging service 140-character messages
E N D
Mention-anomaly-basedEvent Detection and Tracking in TwitterAdrien Guille & Cécile FavreERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014, Beijing, China
WhatisTwitter & whystudyit? • Twitter: micro-blogging service • 140-character messages • Evergrowingnumber of Twitterusers • Pro: Timely source of information • Con: Information overload • How canwe use Twitter for automatedeventdetection and tracking? A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
RelatedWork • Idea: spot bursty patterns • Term-weighting-basedapproaches • PeakyTopics[Shamma11], Trending Score [Benhardus13] • Possible ambiguity, lack of context • Topic-modeling-basedapproaches • On-line LDA [Lau12], ET-LDA [Yuheng12] • Lack of scalability • Clustering-basedapproaches • EDCoW[Weng11], TwEvent[Li12], ET [Parikh13] • Noisy event descriptions A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Issues & Proposal • Shortcomings of existingmethods • Event duration is a fixedparameter • Only the textual content of tweetsisconsidered • We propose a novelapproach and methodthat • Dynamicallyestimateeachevent duration • Exploit the social aspect of tweetstreamsthrough mentions A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
ProposedMethod A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Problem Formulation • Input • Corpus CcontainingN tweetspartitionedinton time-slices • VocabulariesV and V@ • Output • The kmostimpactfulevents • Event:A burstytopic and a value Magtranslatingits magnitude of impact • BurstyTopic:A time intervalI, a main termt, a set S of weightedrelatedterms A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Overview of the proposedmethod • Two-phase flow • 1: Analyse the mention frequency of eachword in V@ to detectevents(Mag,I,t,Ø) • 2: Select relatedwords and generating the final list of the k mostimpactfuleventswhilecontrolingredundancy • MABED, Mention-Anomaly-Based Event Detection A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
ProposedMethod PHASE 1 A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Detecting Events with Mention Anomaly • Computing the anomalyat a point i for wordt • Requirescomputing the expected volume of tweetscontainingat least one mention and t, ati • Normal distribution: • Expectation: • Anomaly: • Measuring the magnitude of impact • Integratinganomaly: A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Detecting Events with Mention Anomaly • For eachwordtin V@ • Solve a « Maximum ContiguousSubsequenceSum » type of problem: • Eventually, eacheventisdescribed by • A main wordt • A period of time I • The magnitude of its impact Mag A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Detecting Events with Mention Anomaly • Example A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
ProposedMethod PHASE 2 A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
SelectingWordsDescribing Events • Identifying candidate words • Set of p wordsthatco-occur the mostwithtduringI • Selecting the most relevant words • Measure the similaritybetween candidate words and the main wordfrequency [Erdem12] • Apply a thresholdθ A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
SelectingWordsDescribing Events • Example A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Generating the List of Top k Events • Event graph & redundancy graph • Detectingduplicatedevents • Connectivity of main terms in the event graph • Overlapbetweenintervals, thresholdσ • Mergingduplicatedevents • Identifyingconnected components in the redundancy graph A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Generating the List of Top k Events • Example A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Evaluation A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Experimental Setup • Corpora • C(en): 1,437,126 tweetspublished in November 2009 • C(fr): 2,086,136 tweetspublished in March 2012 • Baselines for comparison • Trending Score (TS) [Benhardus13] and ET [Parikh13] • α-MABED • Parameter setting • (α-)MABED: 30-min time-slices, p=10,θ=0.7,σ=0.5 • Trending Score, ET: 1-day time-slices A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Evaluation Metrics • Manual annotation • Twohumanannotatorsjudging the significancy of the top 40 eventsdetected by eachmethod (κ = 0.72) • Precision • Significantevents / All detectedevents • Recall • Distinct significantevents / All detectedevents • DERate [Li12] • Duplicatedevents / Significantevents A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Quantitative Evaluation • Performance of the five methods on the twocorpora A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Quantitative Evaluation • Impact of σ on MABED A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Qualitative Evaluation • Improvedreadability • Excerpt of the list of eventsdetected in C(en) by MABED A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Qualitative Evaluation • Improved temporal precision& reducedredundancy • Importance of dynamicallyestimatingevents duration • Politics-relatedevents tend to bediscussed longer [Romero11] A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Implementation Included in the open-source social media data miningtoolSONDY [Guille13] http://mediamining.univ-lyon2.fr/people/guille/mabed.php A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Time-oriented Interface A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Impact-oriented Interface A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Topic-oriented Interface A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Conclusion & Future Work • Propose a novelapproach and method for detectingevents in Twitter • Verifiedhypothesis • Considering mentions helpsdetectingsignificantevents • Experimentalresults on twodifferentdatasetsdemonstrate the accuracy and the robustness of the proposedmethod • Future work • More featuresto model discussions betweenusers A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
References • [Shamma11] D. A. Shamma, L. Kennedy, and E. F. Churchill, “Peaks and persistence: modeling the shape of microblog conversations,” in CSCW, 2011 • [Benhardus13] J. Benhardus and J. Kalita, “Streaming trend detection in twitter,” IJWBC, vol. 9, no. 1, 2013 • [Lau12] J. H. Lau, N. Collier, and T. Baldwin, “On-line trend analysiswithtopicmodels: #twitter trends detectiontopic model online,” in COLING, 2012 • [Yuheng12] H.Yuheng, J.Ajita, D.S.Dorée, and W.Fei, “Whatwere the tweetsabout? topical associations between public events and twitterfeeds,” in ICWSM, 2012 • [Weng11] J. Weng and B.-S. Lee, “Event detection in twitter,” in ICWSM, 2011 • [Li12] C. Li, A. Sun, and A. Datta, “Twevent: Segment-basedeventdetectionfromtweets,” in CIKM, 2012 • [Parikh13] R. Parikh and K. Karlapalem, “Et: eventsfromtweets,” in companion WWW, 2013 • [Erdem12] O. Erdem, E. Ceyhan, and Y. Varli, “A new correlation coefficient for bivariate time-series data,” in MAF, 2012 • [Guille13] A. Guille, C. Favre, H. Hacid, and D. Zighed, “Sondy: An open source platform for social dynamicsmining and analysis,” in SIGMOD, 2013 A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter