1.01k likes | 1.11k Views
Kate Starbird University of Colorado Boulder, ATLAS Institute. Learning from the Crowd: Collaborative Filtering Techniques for Identifying On-the-Ground Twitterers during Mass Disruptions. Grace Muzny University of Washington, Computer Science. Leysia Palen
E N D
Kate Starbird • University of Colorado Boulder, ATLAS Institute • Learning from the Crowd: • Collaborative Filtering Techniques for • Identifying On-the-Ground Twitterers during Mass Disruptions • Grace Muzny • University of Washington, Computer Science • Leysia Palen • University of Colorado Boulder, Computer Science
Sociologists of disaster: After a disaster event, people will converge on the scene to, among other things, offer help
Spontaneous volunteers Sociologists of disaster: After a disaster event, people will converge on the scene to, among other things, offer help
Social Media & Mass Disruption Events Mass Disruption = Mass Convergence
Opportunities for Digital Convergence Citizen Reporting
Opportunities for Digital Convergence Citizen Reporting Challenges of Digital Convergence
Opportunities for Digital Convergence Citizen Reporting Challenges of Digital Convergence Volume
Opportunities for Digital Convergence Citizen Reporting Challenges of Digital Convergence Volume Noise
Opportunities for Digital Convergence Citizen Reporting Challenges of Digital Convergence Volume Noise Misinformation & Disinformation
Opportunities for Digital Convergence Citizen Reporting Crowd Work! Challenges of Digital Convergence Volume Noise Misinformation & Disinformation
Signal Noise?
Signal Starbird, K., Palen, L., Hughes, A.L., & Vieweg, S. (2010). Chatter on The Red: What Hazards Threat Reveals about the Social Life of Microblogged Information. CSCW 2010
Original Information Signal Starbird, K., Palen, L., Hughes, A.L., & Vieweg, S. (2010). Chatter on The Red: What Hazards Threat Reveals about the Social Life of Microblogged Information. CSCW 2010
Original Information First hand info New info coming in to the space for the first time Signal Starbird, K., Palen, L., Hughes, A.L., & Vieweg, S. (2010). Chatter on The Red: What Hazards Threat Reveals about the Social Life of Microblogged Information. CSCW 2010
Derivative Behavior Original Information First hand info Re-sourced Info Reposts Links/URLs Network Connections New info coming in to the space for the first time Signal Starbird, K., Palen, L., Hughes, A.L., & Vieweg, S. (2010). Chatter on The Red: What Hazards Threat Reveals about the Social Life of Microblogged Information. CSCW 2010
use to find Derivative Behavior Original Information First hand info Re-sourced Info Reposts Links/URLs Network Connections New info coming in to the space for the first time Signal
use to find Derivative Behavior Original Information follow RT RT @mention RT RT @mention RT RT Signal follow @mention RT follow follow RT RT @mention RT @mention follow RT
use to find Derivative Behavior Original Information follow RT RT @mention RT RT @mention RT RT Signal follow @mention RT follow follow RT RT @mention RT @mention follow RT
use to find Derivative Behavior Original Information follow RT RT @mention RT RT @mention RT RT Signal follow @mention RT follow follow RT RT @mention RT @mention follow RT
use to find Derivative Behavior Original Information follow RT RT Collaborative Filtering @mention RT RT @mention RT RT Signal follow @mention RT follow follow RT RT @mention RT @mention follow RT
use to find Derivative Behavior Original Information follow RT RT Collaborative Filtering @mention RT RT @mention RT RT Signal follow @mention RT follow follow RT RT @mention RT @mention follow RT Crowd Work
Learning from the Crowd: A Collaborative Filter for Identifying Locals
Learning from the Crowd: A Collaborative Filter for Identifying Locals • Background • Why Identify Locals? • Empirical Study on Crowd Work during Egypt Protests • Test Machine Learning Solution for Identifying Locals • Event - Occupy Wall Street in NYC • Data Collection & Analysis • Findings • Discussion • Leveraging Crowd Work • From Empirical Work to Computational Solutions
Why Help Identify Locals? • Citizen Reporting: first hand info can contribute to situational awareness • Info not already in the larger information space • Digital volunteers often work to identify and create lists of on-the-ground Twitterers
Why Help Identify Locals? • Crisis events vs. protest events
Why Help Identify Locals? • Crisis events vs. protest events • Tunisia Protests - activists tweeting from the ground were a valuable source of info for journalists (Lohan, 2011) • Egypt Protests - protestors on the ground were actively fostering solidarity from the remote crowd • (Starbird and Palen, 2012)
Why Help Identify Locals? • Occupy Wall Street (OWS) Protests: Protestors on the ground wanted to publicize their numbers, foster solidarity with the crowd, and solicit assistance • @jeffrae: We could really use a generator down here at Zuccotii Park. Can anyone help? #occupyWallStreet #takewallst #Sept17
Why Help Identify Locals? • Occupy Wall Street (OWS) Protests: Protestors on the ground wanted to publicize their numbers, foster solidarity with the crowd, and solicit assistance • @jeffrae: We could really use a generator down here at Zuccotii Park. Can anyone help? #occupyWallStreet #takewallst #Sept17 • OWS Protests: Remote supporters aggregated and published lists of those on the ground • @CassProphet: Follow on-scene @AACina @Jeffrae @DhaniBagels @Korgasm_ @brettchamberlin #TakeWallStreet #OurWallStreet #OccupyWallStreet #yeswecamp • @djjohnso: We have 20 livetweeters for this list. Are there others? @djjohnso/occupywallstreetlive #takewallstreet #OurWallStreet #needsoftheoccupiers
Empirical Study of Crowd Work during Political Protests
Learning from the Crowd Empirical Study of Crowd Work during 2011 Egypt Revolution • something • something else • some more • Collected #egypt #jan25 tweets • 2,229,129 tweets • 338,895 Twitterers • Identified most-RTed Twitterers • Determined location for sample
Learning from the Crowd Empirical Study of Crowd Work during 2011 Egypt Revolution • Crowd may work to identify on-the-ground Twitterers
Learning from the Crowd Empirical Study of Crowd Work during 2011 Egypt Revolution • Crowd may work to identify on-the-ground Twitterers • Identified several recommendation and user behavior features that had significant relationships to being “on the ground”
Learning from the Crowd Empirical Study of Crowd Work during 2011 Egypt Revolution • Crowd may work to identify on-the-ground Twitterers • Identified several recommendation and user behavior features that had significant relationships to being “on the ground” • More times retweeted = more likely to be on the ground
Learning from the Crowd Empirical Study of Crowd Work during 2011 Egypt Revolution • Crowd may work to identify on-the-ground Twitterers • Identified several recommendation and user behavior features that had significant relationships to being “on the ground” • More times retweeted = more likely to be on the ground • More unique retweets = more likely to be on the ground
Learning from the Crowd Empirical Study of Crowd Work during 2011 Egypt Revolution • Crowd may work to identify on-the-ground Twitterers • Identified several recommendation and user behavior features that had significant relationships to being “on the ground” • More times retweeted = more likely to be on the ground • More unique retweets = more likely to be on the ground • More followers at beginning of event = less likely to be on the ground
Learning from the Crowd Empirical Study of Crowd Work during 2011 Egypt Revolution • Crowd may work to identify on-the-ground Twitterers • Identified several recommendation and user behavior features that had significant relationships to being “on the ground” • More times retweeted = more likely to be on the ground • More unique retweets = more likely to be on the ground • More followers at beginning of event = less likely to be on the ground Feature not available in tweet metadata. Identified through qualitative analysis, then calculated and evaluated through quantitative analysis.
Goal: Test Viability of a Machine Learning Solution to Identify Locals using Crowd Recommendation Behavior
Goal: Test Viability of a Machine Learning Solution to Identify Locals using Crowd Recommendation Behavior Move from Empirical Work to Computational Solution
Event: Occupy Wall Street Protests September 15-21, 2011 NYC site - Zuccotti Park
Data Collection and Sampling • 270,508 Tweets - Search API, Streaming API
Data Collection and Sampling • 270,508 Tweets - Search API, Streaming API
Data Collection and Sampling • 270,508 Tweets - Search API, Streaming API • 53,296 Total Twitterers
Data Collection and Sampling • 270,508 Tweets - Search API, Streaming API • 53,296 Total Twitterers • 23,847 Twitterers sent >= 2 tweets • allowing us to capture profile change
Data Collection and Sampling • 270,508 Tweets - Search API, Streaming API • 53,296 Total Twitterers • 23,847 Twitterers sent >= 2 tweets • allowing us to capture profile change • Tweets from Streaming API contain Twitter profile information