1 / 30

Geo-spatial Event Detection in the Twitter Stream

Geo-spatial Event Detection in the Twitter Stream. Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013. Outline. Introduction & Context Social Media Analysis in a C2 Center The “Avalanche” event detection approach Identify posting “hot spots”

maura
Download Presentation

Geo-spatial Event Detection in the Twitter Stream

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

  2. Outline • Introduction & Context • Social Media Analysis in a C2 Center • The “Avalanche” event detection approach • Identify posting “hot spots” • Evaluate post clusters with Machine Learning approach • Evaluation • Future work

  3. Background: Social Data • Social Media continuously creates massive amounts of data • E.g. 500 Million tweets each day: ~300 GB raw data • Nature of the data: • time-stamped • textual (many languages, lingos & slangs, spelling mistakes are ripe, only a few words per tweet) • links to pictures • links to news paper articles (more text) • sometimes geo-spatial (contains coordinates) • Creating real actionable insights from this isn’t an easy problem •  This talk gives one specific example how this can be done

  4. Use case: Urban Management & Public Safety • Cites today are complex and need to be organized • Administration is responsible for keeping population safe • emergency services • health services • fire fighters • police Command & Control Center

  5. Urban Management & Public Safety • Why is Social Media relevant in this context? ?

  6. Urban Management & Public Safety • Why is Social Media relevant in this context? “There's a plane in the Hudson. I'm on the ferry going to pick up the people. Crazy”

  7. Urban Management & Public Safety • Why is Social Media relevant in this context? “De tering, wat een hel!!! 1,4 miljoen mensen op dat terrein! #loveparade”

  8. Urban Management & Public Safety • Why is Social Media relevant in this context? “#Hoboken is on fire. Building above Hoboken Farm Corporation at 300 Washington is all smoked out”  Social Media can help creating a situational awareness picture

  9. Context: Social Media in a C2 Center

  10. Avalanche: Event detection in a C2 Center

  11. Avalanche: Event detection in a C2 Center

  12. Avalanche: Event detection in a C2 Center

  13. Avalanche: Event detection in a C2 Center

  14. Avalanche: Event detection in a C2 Center

  15. Avalanche: Event detection in a C2 Center

  16. How is it done? • Two step approach: • Identify locations with high tweet activity • Collect geo-spatial tweet clusters • Evaluate clusters with a Machine Learning approach • Do these clusters constitute an real-world event that the tweeters are witnessing first-hand? • Work in Progress: • Classify events according to type

  17. Machine Learning – What is the task? = geo-located Social Media post (Tweet)

  18. Machine Learning – What is the task? Good • Suspicious package in #GrandCentral #NYC #bomb threat possibility not sure?? http://t.co/VwU7SP3X • Suspicious package found in Grand Central Station... the 456 train..the trains are closed !! [pic]: http://t.co/9YPki4k2 • Something happened in the #456 #trainstation in #GrandCentral #NYC http://t.co/GGKvQura • Accident on the #456train in #midtown #NYC http://t.co/fj2mJJmf vs. • RT @refinery29: This image of Madeleine Albright playing the drums will be the best thing you'll see today: http://t.co/rGwQ5RdG • «@_PrettyPoison Guess ill fill out more job apps today» make punna fill out some 2! • The Glamour & Glitz at the 2012 Emmy' s that we loved! http://t.co/CiTFszfL • @IszwanieSyahira: i'm happy and i hope u feel the same too. weeeee ~.~ • How to prepare yourself for Friday's apocalypse http://cnet.co/lPU Bad We need to automatically determine which of the tweet clusters (tweets issued close to each other in a short time frame) represent real-world events and which are just random chatter.

  19. Architecture • We look for geo-spatial clusters of tweets (e.g. 3 or more tweets in a 200m radius, posted within 30 mins) • These become “event candidates” • Event candidates are evaluated with a Machine Learning scheme. • We currently use C4.5 decision trees.

  20. Machine Learning - Features • Tweet cluster: • Suspicious package in #GrandCentral #NYC #bomb threat possibility not sure?? http://t.co/VwU7SP3X • Suspicious package found in Grand Central Station... the 456 train..the trains are closed !! [pic]: http://t.co/9YPki4k2 • Something happened in the #456 #trainstation in #GrandCentral #NYC http://t.co/GGKvQura • Accident on the #456train in #midtown #NYC http://t.co/fj2mJJmf

  21. Scalable Machine Learning … …with Weka! Blue = training Green = runtime In offline ML, we train once, but use the predictive model possibly millions of times a day.  It’s okay if training isn’t fast as lightning.  But during execution every CPU cycle can count.

  22. Scalable Machine Learning … …with Weka! … … which can be optimized further in various ways. See e.g. Nima Asadi, Jimmy Lin, Arjen P. de Vries. Runtime Optimizations for Tree-Based Machine Learning Models. IEEE Transactions on Knowledge and Data Engineering, 2013.

  23. Machine Learning - Evaluation • Evaluation setup: • 1,000 hand-labeled tweet clusters. • 319 good, 681 bad. • 10-fold cross validation.

  24. Machine Learning - Evaluation • Evaluation setup: • 1,000 hand-labeled tweet clusters. 319 good, 681 bad. • 10-fold cross validation.

  25. Machine Learning - Evaluation 1 Common Theme score 0 1 Unique Posters score Blue: event Red: no event • Evaluation setup: • 1,000 hand-labeled tweet clusters. 319 good, 681 bad. • 10-fold cross validation.

  26. (Somewhat simplyfied) Summary • If there are several tweets … • from roughly the same location • at roughly the same time • from different users • that nevertheless use the same words • … chances are good that we have detected an event.

  27. Outlook – work in progress and future work • Derive more coordinates • from shared pictures • from toponyms in posts • use image sharing sites directly • Make use of posts without coordinates • and add them to already existing clusters • Explore real-time TF-IDF • to get rid of the Kardashians & Beliebers • Evaluate system with real-world data • Because recall numbers are currently somewhat misleading

  28. Machine Learning – Relevance Feedback Work in progress Machine Learning Model Good Bad Documents (e.g. tweets, post clusters) Good Users (journalists, C2 operators ) • Users implicitly rate documents by how they interact with them • User performs follow up actions  relevant • User clicks document away  irrelevant •  System learns to present more relevant documents •  System can adapt to changing needs over time

  29. Example: Explosion in an image Image Analysis of shared pictures Work in progress Explosion detected with Image Analysis OMG!!! http://t.co/maiAgHoh OMG!!! • Problem: • Not all tweets contain useful textual information • Shared text might be hard to analyze • Solution: • ~35% of tweets contain linked images • Images provide a wealth of information that can be analyzed • Objects, events, persons • coordinates

  30. Thank you!

More Related