660 likes | 866 Views
Location Mining from Online Social Networks. Satyen Abrol Advisors: Dr. Latifur Khan Dr. Bhavani Thuraisingham. Location Mining in Online Social Networks. What is the city level home location of a user?. Outline. Introduction and Problem Statement Different Approaches
E N D
Location Mining from Online Social Networks SatyenAbrol Advisors: Dr. Latifur Khan Dr. BhavaniThuraisingham
Location Mining in Online Social Networks What is the city level home location of a user?
Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work
Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work
Twitter - Basics Location # of Followers # of Following # of Tweets Tweets: Maximum 140 Characters
Privacy and Security • Losing locational privacy forever • Users leave field blank, don’t want strangers to know their locations • http://pleaserobme.com/
Trustworthiness To be able to trust/verify the correctness of location mentioned in user profile • Corporate companies use social media for better advertising and marketing • Iran Elections of 2009 • US State Department used Twitter as a source • Trustworthiness is important in such cases
Marketing and Business • Large corporations Walmart, Starbucks, United Airlines use social media • Great tool for inexpensive advertising • Getting feedback from users
The Problem • Leave the location field blank in their Twitter profiles • Do not provide valid geographic information • “Justin Biebers heart”, “NON YA BISNESS!!”, “looking down on u people” • Provide incorrect locations which may actually exist in real world • “Nothing” in Arizona, “Little Heaven” in Connecticut • Provide several locations, difficult to identify the home location • “CALi b0Y $TuCCiN V3Ga$” – California boy stuck in Las Vegas, NV • (~35%) enter just country, state, county, etc. and no city level locations1 B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justinbiebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11.
Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work
Location Prediction in Social Networks • Two Approaches • Content Based1,2 • Using Social Graph3,4,5 Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: A content-based approach to geo-locating twitter users”. In CIKM ’10. B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justin biebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11. S. Abrol, L. Khan and B. Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA. S. Abrol., L. Khan and B. Thuraisingham “Tweecalization: Efficient and intelligent location mining in Twitter using semi-supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012 Pittsburgh, Pennsylvania. S. Abrol., L. Khan, “Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining,” The Second IEEE International Conference on Social Computing (SocialCom2010), Aug 20-22, 2010 Minneapolis, Minnesota.
Content Based Approach • Inaccurate – Location in Text not Location of User • Involves Ambiguity: Paris can mean • Paris Hilton • Paris, the capital of France • Paris, a town in Texas • Slow – Uses NLP/ Machine Learning techniques, searches gazetteers
Using Social Graphs • Based on Japanese Proverb - “When the character of a man is not clear to you, look at his friends.” • Relationship between geospatial proximity and friendship • Uses classical data mining algorithms for more accurate results • Faster and can be used for real world applications
Geospatial Proximity and Friendship • Form 1012 Twitter user pairs and identify geo distance • Curve follows power law, curve of form a(x+b)-c with exponent of -0.87
Graph Construction • Vertices (data points) represents users • Edge represents ‘similarity’ between two users • Deal with special cases • Spammers – follow random people • Celebrities – followed by random people • Edge weight gets abbreviated
Defining Edge Weight • Consists of two components: • Trustworthiness (TW) • Mutual Friends (MF)
Trustworthiness • Fraction of friends which have the same label as the user himself • Intuition: A person who has stayed at the same place all his life will have most friends from same location and hence high trustworthiness Location : Seattle/WA/USA Location : Seattle/WA/USA Location : Seattle/WA/USA Trustworthiness: 0.6 Friend Location:Seattle/WA/USA Location : Seattle/WA/USA Location : Seattle/WA/USA Location : Seattle/WA/USA
Mutual Friends • Chose number common friends for similarity • Better Accuracy • Low Time Complexity
Defining Edge Weight • Defined as Weightij=α×Max{TW(Ui), TW(Uj)} + (1- α) × MFij • 0<α<1, typically chosen to be around 0.7
Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work
Tweethood: Fuzzy k-Closest Friends with Variable Depth • Choose k “closest” friends for the user • If location is not found look further for the answer • Each node is defined by a vector having locations with their respective probabilities • Boost and Aggregate at each step Satyen Abrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010
Social Network of John Doe CB1 CB2 CB3 CBn
Choose k closest friends of John Doe CB1 CB2 CB3 CBk
Identify Locations Location : NULL CB1 LOW ACCURACY Location : Seattle, USA CB2 CB3 Location : NULL CBk Location : NULL
What if we have depth=2 ? Location : Seattle/WA/USA Location : NULL Location : NULL Location : Dallas/TX/USA Location : NULL Location : Sydney/AU CB1 Location : Dallas/TX/USA CB2 Location : NULL Location : Richardson/TX/USA CB3 Location : NULL CBk
Location Vector for John Doe’s friends Dallas/TX/USA 0.4 Seattle/WA/USA 0.2 Richardson/TX/USA 0.2 Sydney/AU 0.2 CB1 Dallas/TX/USA 0.33 New Delhi/Delhi/India 0.33 Sunnyvale/CA/USA 0.33 CB2 CB3 Austin/TX/USA 0.50 Minneapolis/MN/USA 0.50 CBk Plano/TX/USA 0.25 Boulder/CO/USA 0.25 Salt Lake City/UT/USA 0.25 London/London/GB 0.25
Location Vector for John Doe Dallas/TX/USA 0.1825 Seattle/WA/USA 0.05 Richardson/TX/USA 0.05 Sydney/AU 0.05 New Delhi/Delhi/IN 0.0825 Sunnyvale/CA/USA 0.0825 Austin/TX/USA 0.125 Minneapolis/MN/USA 0.125 Plano/TX/USA 0.0625 Boulder/CO/USA 0.0625 Salt Lake City/UT/US 0.0625 London/GB 0.0625
Agglomerative Clustering Dallas/TX/USA 0.1825 Seattle/WA/USA 0.05 Richardson/TX/USA 0.05 Sydney/AU 0.05 New Delhi/Delhi/IN 0.0825 Sunnyvale/CA/USA 0.0825 Austin/TX/USA 0.125 Minneapolis/MN/USA 0.125 Plano/TX/USA 0.0625 Boulder/CO/USA 0.0625 Salt Lake City/UT/US 0.0625 London/GB 0.0625
Agglomerative Clustering {Dallas, Plano, Richardson}/TX/USA 0.295 Seattle/WA/USA 0.05 Sydney/AU 0.05 New Delhi/Delhi/IN 0.0825 Sunnyvale/CA/USA 0.0825 Austin/TX/USA 0.125 Minneapolis/MN/USA 0.125 Boulder/CO/USA 0.0625 Salt Lake City/UT/US 0.0625 London/GB 0.0625
Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work
Tweecalization: Label Propagation • But the availability of users with location is limited • Most of users do not have a location • Need a method that can learn from unlabeled data Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania
Tweecalization: Label Propagation • Ideal scenario for semi supervised learning: Only a few friends with locations(labeled data)1 • Use both labeled and unlabeled data for training • Points which are close to each other are more likely to share a label Y. Bengio, O. Dellalleau, and N. L. Roux, “Label propagation and quadratic criterion,” In O. Chapelle, B. Schlkopf and A. Zien (Eds.), Semi-supervised learning. MIT Press, 2006.
Label Propagation: An Illustration “CLAMPED LOCATIONS” Central User Friends with location Friends without location ?
Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work
What About Temporal Analysis? • None of the existing works do temporal analysis • What about migration/ geographical mobility?
Migration/Geographical Mobility • 4% to 6% every year, means 12 to 17 million each year United States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/
Migration/Geographical Mobility • Migration as a function of age • People aged 20-29 have a higher probability to move High Migration Rate: College and Jobs Low Migration Rate: Old age, people settle down United States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/
Facebook Users and Mobility • Let us look at the cumulative effect • Only 28% to 37% are currently living in their hometown Based on our experiments on 300k Public Facebook Profiles
Twitter Users and Mobility • Linking Twitter users to migration • 33% of all Twitter users are aged 25-34 years Based on our findings by [1] ABI Research. Online. Available: http://www.abiresearch.com
Tweeque: Graph Partitioning • How do we know if “this” is the current location for a user? • How do we perform temporal analysis of friendships? • Propose a technique that indirectly infers the current location SatyenAbrol, Latifur Khan and BhavaniThuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.
Observation 1: Social Cliques and Location • Our definition: A social clique is an inclusive group of people that share friendship • Apart from friendship, what is the attribute that links members of a clique? Individual Locations • All members of a clique were or are at a particular geographical location at a particular instant of time like college, school, a company, etc.
Observation 2: Migration and Time • As shown previously over course of time, people have tendency to migrate • Based on these two observations we hypothesize • If we can divide the social graph of a particular user into cliques and check for location based purity of the cliques, we can accurately separate out his current location from previous locations. • Migration is our latent time factor
Tweeque: An example Friends from high school in Dallas Friends from college in Boston Relatives/Cousins Friends from job in Seattle
Tweeque: An example All Friends of the User
Tweeque: An example Social Clique #1 (High School) Social Clique #2 (College) Social Clique #3 (Current Work) Social Clique #4 (Relatives)
Tweeque: An Example Relatives High School College Work Singapore Seattle/WA/USA Boston/MA/USA Dallas/TX/USA Seattle/WA/USA Sydney/Australia Portland/OR/USA Seattle/WA/USA Dallas/TX/USA Dallas/TX/USA Dallas/TX/USA Austin/TX/USA Dallas/TX/USA Seattle/WA/USA Boston/MA/USA San Diego/CA/USA Ontario/Canada Redmond/WA/USA Dallas/TX/USA New York/NY/USA Purity (Dallas) = 0.32 Purity (Boston) = 0.45 Purity (Dallas) = 0.18 Purity (Seattle) = 0.69