Finding Event-Specific Influencers in Dynamic Social Networks

Finding Event-Specific Influencers in Dynamic Social Networks Masters Thesis – Chris Schenk December 1st, 2010

Outline • Problem overview • Influencers, reputation, validation and security • Summary of analysis methods • Boulder fire data • Twitter Data • API, formats, collection and data limitations • Statistics • Finding event-specific influencers – Rankings • Stats • Hyperlink-Induced Topic Search (HITS) • Context-specific in-degree (original work) • Conclusions and Future Work

Problem Overview

Influencers • Social dynamics vs online social dynamics • Social network features • Search, friends, re-tweets • Influencers and sheep • What is meant by influence? • Understanding the data • Sampling and baseline statistics • Similarity measures, clustering • Semantics, intent (NLP) • Baseline activity

Influencers – Network Structure • Betweenness/Closeness centrality • PageRank/TwitterRank/TunkRank • Local/Global hierarchical clustering • K-core decomposition • K-clique percolation • Nearest Neighbor Networks • Assortative mixing • HITS • Activity Network

Twitter Data Stats – Boulder Fire • Tweets • First day – September 6th, 2010 10:00am to September 7th, 2010 10:00am, Mountain time • First week – September 6th, 2010 10:00am to September 13th, 2010 10:00am, Mountain time • Social graph • Five one-day snapshots beginning September 7th, 2010 12:40pm, Mountain time • Tweet example • RT @garytx: Article on Twitter's use during #eqnz, #boulderfire, and #sanbrunofire: http://bit.ly/cwI1fi • kate30_CU - 2010-09-13 15:29:24+00:00 • Keywords: boulder, boulderfire, fourmilefire, fourmilecanyon, 4milefire

Qualitatively Influential Users • Sixteen users gathered by Jo White • Used as “ground truth” data for ranking comparison

Twitter API and Data Collection • Search+Track+REST • Unique users for a given event • Profiles • Periodic collection • Friends/Followers • Periodic collection • Tweets • One-time collection • Limitations • Rate limits, multi-threading • Improper SQL query

Tweet Stats

Tweet Stats (cont.)

Graph Stats • Timezone: Mountain

Location Data – U.S.

Location Data – Denver Metro

Location Data – Boulder, Longmont, Broomfield

User “fishnette” Data - Aggregate Hourly Tweet Counts

User “fishnette” Data – Aggregate Monthly Tweet Counts

Hashtag Counts

Addressed Messages

Re-tweets

Finding Influencers - Rankings • Tweets • Number of tweets • Username mentions • Number of re-tweets • Graph • In-degree • HITS • all users (sorted by frequency) • active users • Mentions • addressed messages (replies) • Context-specific in-degree • Global followers count • Active edges (pre-existing network) • New Edges

Rankings - Number of Tweets

Rankings – Username Mentions

Rankings – Re-tweets

Rankings – In-degree (Followers)

Hyperlink-Induced Topic Search (HITS) • Hubs • Those that link to many authorities • Authorities • Those that are linked to by many hubs • Process • Calculate the principle eigenvector of two matrices • Followers adjacency matrix (authorities) • Friends adjacency matrix (hubs) • Iterative • Rankings by highest value descending in eigenvectors

Rankings – HITS – All users

Rankings – HITS – Active Users

Rankings – HITS – Mentions

Rankings – HITS – Addressed Msgs.

Context-specific In-degree Ranking • Global followers count • Periodically download user profiles • Calculate change in followers count for each snapshot • Rank based on overall change, descending • Active edges (includes pre-existing edges) • Periodically download friend/follower lists • Calculate change in followers count for each snapshot • Rank based on overall change, descending • New Edges • Periodically download friend/follower lists • Calculate change in followers count for each snapshot • Do not count edges that existed prior to the start of the event • Rank based on overall change, descending

Rankings – Global Followers Count

Rankings – Active Edges

Rankings – New Edges

Limitations and Modifications • On-going influence • Can only measure when a user becomes influential • Global popularity masking local influence • User “andrewhyde” • News and bot activity • Extra data needed to ignore these users • Large events • Data collection limitations • How important is a de-follow? • Can identify individual user activity • Identifying the sheep • Can equivalently count friends (out-links) created

Conclusions • Notions of influence and interaction are heavily dependent on social network features • No agreement on definitions • Influence measured by features not 100% in use • Or features not used in the same way by everyone • Composability problem • HITS ranking no better than global in-degree • Context-specific in-degree ranking good! • Needs to be tested on multiple events of varying sizes

Future Work • Understanding “baseline” behavior • For users active (using keywords) during an event • Calculate all given statistics for a user (Klout.com?) • Lots of ways to cut the data • Composable factors/measures/attributes • Explaining new links created • Models for searching, re-tweeting, hashtags, #ff, etc • Incorporating blogs, forums, news websites • Real-time vs not • Informing algorithms with other techniques • NLP and more automation • Qualitative analysis (crowdsourcing?)

Thanks! Questions?

Reputation • Definitions? • Scores • Composability • Explicit reputation • Ratings, votes • Implicit reputation • Client • Server

Validation • Ground truth • Authorities • Armies of grad students • Crowd-sourcing? • More data • Cross-referencing • News websites • Blogs • Public health and safety (or other)

Security • Malicious users • Inflation of reputation • Sybil attacks • Reporting • Audience? • Anonymization

Finding Event-Specific Influencers in Dynamic Social Networks

Finding Event-Specific Influencers in Dynamic Social Networks

Presentation Transcript

A Framework for Finding Communities in Dynamic Social Networks

Dynamic Networks

Cooperation in Anonymous Dynamic Social Networks

CUT: Community Update and Tracking in Dynamic Social Networks

Influence Maximization in Dynamic Social Networks

Dynamic Models of On-line Social Networks

Finding ‘‘interesting’’ trends in social networks using frequent pattern

Finding Effectors in Social Networks

Finding Specific Articles Online

A Framework For Community Identification in Dynamic Social Networks

Visualizing the Evolution of Community Structures in Dynamic Social Networks

Dynamic Models of On-Line Social Networks

Page Migration in Dynamic Networks

Discovering Social Networks from Event Logs

Agent-Based Methods for Dynamic Social Networks

Dynamic Networks

Institutional Social Dynamic Dependence Networks

Finding Event Venues In Pittsburgh

Finding the right Photographer for your specific Event

Social influencers agency

Neural Networks in Social Networks

UK social media influencers