1 / 40

Finding Event-Specific Influencers in Dynamic Social Networks

Finding Event-Specific Influencers in Dynamic Social Networks. Masters Thesis – Chris Schenk December 1 st , 2010. Outline. Problem overview Influencers, reputation, validation and security Summary of analysis methods Boulder f ire data Twitter Data

afi
Download Presentation

Finding Event-Specific Influencers in Dynamic Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Event-Specific Influencers in Dynamic Social Networks Masters Thesis – Chris Schenk December 1st, 2010

  2. Outline • Problem overview • Influencers, reputation, validation and security • Summary of analysis methods • Boulder fire data • Twitter Data • API, formats, collection and data limitations • Statistics • Finding event-specific influencers – Rankings • Stats • Hyperlink-Induced Topic Search (HITS) • Context-specific in-degree (original work) • Conclusions and Future Work

  3. Problem Overview

  4. Influencers • Social dynamics vs online social dynamics • Social network features • Search, friends, re-tweets • Influencers and sheep • What is meant by influence? • Understanding the data • Sampling and baseline statistics • Similarity measures, clustering • Semantics, intent (NLP) • Baseline activity

  5. Influencers – Network Structure • Betweenness/Closeness centrality • PageRank/TwitterRank/TunkRank • Local/Global hierarchical clustering • K-core decomposition • K-clique percolation • Nearest Neighbor Networks • Assortative mixing • HITS • Activity Network

  6. Twitter Data Stats – Boulder Fire • Tweets • First day – September 6th, 2010 10:00am to September 7th, 2010 10:00am, Mountain time • First week – September 6th, 2010 10:00am to September 13th, 2010 10:00am, Mountain time • Social graph • Five one-day snapshots beginning September 7th, 2010 12:40pm, Mountain time • Tweet example • RT @garytx: Article on Twitter's use during #eqnz, #boulderfire, and #sanbrunofire: http://bit.ly/cwI1fi • kate30_CU - 2010-09-13 15:29:24+00:00 • Keywords: boulder, boulderfire, fourmilefire, fourmilecanyon, 4milefire

  7. Qualitatively Influential Users • Sixteen users gathered by Jo White • Used as “ground truth” data for ranking comparison

  8. Twitter API and Data Collection • Search+Track+REST • Unique users for a given event • Profiles • Periodic collection • Friends/Followers • Periodic collection • Tweets • One-time collection • Limitations • Rate limits, multi-threading • Improper SQL query

  9. Tweet Stats

  10. Tweet Stats (cont.)

  11. Graph Stats • Timezone: Mountain

  12. Location Data – U.S.

  13. Location Data – Denver Metro

  14. Location Data – Boulder, Longmont, Broomfield

  15. User “fishnette” Data - Aggregate Hourly Tweet Counts

  16. User “fishnette” Data – Aggregate Monthly Tweet Counts

  17. Hashtag Counts

  18. Addressed Messages

  19. Re-tweets

  20. Finding Influencers - Rankings • Tweets • Number of tweets • Username mentions • Number of re-tweets • Graph • In-degree • HITS • all users (sorted by frequency) • active users • Mentions • addressed messages (replies) • Context-specific in-degree • Global followers count • Active edges (pre-existing network) • New Edges

  21. Rankings - Number of Tweets

  22. Rankings – Username Mentions

  23. Rankings – Re-tweets

  24. Rankings – In-degree (Followers)

  25. Hyperlink-Induced Topic Search (HITS) • Hubs • Those that link to many authorities • Authorities • Those that are linked to by many hubs • Process • Calculate the principle eigenvector of two matrices • Followers adjacency matrix (authorities) • Friends adjacency matrix (hubs) • Iterative • Rankings by highest value descending in eigenvectors

  26. Rankings – HITS – All users

  27. Rankings – HITS – Active Users

  28. Rankings – HITS – Mentions

  29. Rankings – HITS – Addressed Msgs.

  30. Context-specific In-degree Ranking • Global followers count • Periodically download user profiles • Calculate change in followers count for each snapshot • Rank based on overall change, descending • Active edges (includes pre-existing edges) • Periodically download friend/follower lists • Calculate change in followers count for each snapshot • Rank based on overall change, descending • New Edges • Periodically download friend/follower lists • Calculate change in followers count for each snapshot • Do not count edges that existed prior to the start of the event • Rank based on overall change, descending

  31. Rankings – Global Followers Count

  32. Rankings – Active Edges

  33. Rankings – New Edges

  34. Limitations and Modifications • On-going influence • Can only measure when a user becomes influential • Global popularity masking local influence • User “andrewhyde” • News and bot activity • Extra data needed to ignore these users • Large events • Data collection limitations • How important is a de-follow? • Can identify individual user activity • Identifying the sheep • Can equivalently count friends (out-links) created

  35. Conclusions • Notions of influence and interaction are heavily dependent on social network features • No agreement on definitions • Influence measured by features not 100% in use • Or features not used in the same way by everyone • Composability problem • HITS ranking no better than global in-degree • Context-specific in-degree ranking good! • Needs to be tested on multiple events of varying sizes

  36. Future Work • Understanding “baseline” behavior • For users active (using keywords) during an event • Calculate all given statistics for a user (Klout.com?) • Lots of ways to cut the data • Composable factors/measures/attributes • Explaining new links created • Models for searching, re-tweeting, hashtags, #ff, etc • Incorporating blogs, forums, news websites • Real-time vs not • Informing algorithms with other techniques • NLP and more automation • Qualitative analysis (crowdsourcing?)

  37. Thanks! Questions?

  38. Reputation • Definitions? • Scores • Composability • Explicit reputation • Ratings, votes • Implicit reputation • Client • Server

  39. Validation • Ground truth • Authorities • Armies of grad students • Crowd-sourcing? • More data • Cross-referencing • News websites • Blogs • Public health and safety (or other)

  40. Security • Malicious users • Inflation of reputation • Sybil attacks • Reporting • Audience? • Anonymization

More Related