1 / 66

Location Mining from Online Social Networks

Location Mining from Online Social Networks. Satyen Abrol Advisors: Dr. Latifur Khan Dr. Bhavani Thuraisingham. Location Mining in Online Social Networks. What is the city level home location of a user?. Outline. Introduction and Problem Statement Different Approaches

avari
Download Presentation

Location Mining from Online Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Location Mining from Online Social Networks SatyenAbrol Advisors: Dr. Latifur Khan Dr. BhavaniThuraisingham

  2. Location Mining in Online Social Networks What is the city level home location of a user?

  3. Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work

  4. Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work

  5. Twitter - Basics Location # of Followers # of Following # of Tweets Tweets: Maximum 140 Characters

  6. Why is location so important?

  7. Privacy and Security • Losing locational privacy forever • Users leave field blank, don’t want strangers to know their locations • http://pleaserobme.com/

  8. Trustworthiness To be able to trust/verify the correctness of location mentioned in user profile • Corporate companies use social media for better advertising and marketing • Iran Elections of 2009 • US State Department used Twitter as a source • Trustworthiness is important in such cases

  9. Marketing and Business • Large corporations Walmart, Starbucks, United Airlines use social media • Great tool for inexpensive advertising • Getting feedback from users

  10. The Problem • Leave the location field blank in their Twitter profiles • Do not provide valid geographic information • “Justin Biebers heart”, “NON YA BISNESS!!”, “looking down on u people” • Provide incorrect locations which may actually exist in real world • “Nothing” in Arizona, “Little Heaven” in Connecticut • Provide several locations, difficult to identify the home location • “CALi b0Y $TuCCiN V3Ga$” – California boy stuck in Las Vegas, NV • (~35%) enter just country, state, county, etc. and no city level locations1 B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justinbiebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11.

  11. Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work

  12. Location Prediction in Social Networks • Two Approaches • Content Based1,2 • Using Social Graph3,4,5 Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: A content-based approach to geo-locating twitter users”. In CIKM ’10. B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justin biebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11. S. Abrol, L. Khan and B. Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA. S. Abrol., L. Khan and B. Thuraisingham “Tweecalization: Efficient and intelligent location mining in Twitter using semi-supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012 Pittsburgh, Pennsylvania. S. Abrol., L. Khan, “Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining,” The Second IEEE International Conference on Social Computing (SocialCom2010), Aug 20-22, 2010 Minneapolis, Minnesota.

  13. Content Based Approach • Inaccurate – Location in Text not Location of User • Involves Ambiguity: Paris can mean • Paris Hilton • Paris, the capital of France • Paris, a town in Texas • Slow – Uses NLP/ Machine Learning techniques, searches gazetteers

  14. Using Social Graphs • Based on Japanese Proverb - “When the character of a man is not clear to you, look at his friends.” • Relationship between geospatial proximity and friendship • Uses classical data mining algorithms for more accurate results • Faster and can be used for real world applications

  15. Geospatial Proximity and Friendship • Form 1012 Twitter user pairs and identify geo distance • Curve follows power law, curve of form a(x+b)-c with exponent of -0.87

  16. Graph Construction • Vertices (data points) represents users • Edge represents ‘similarity’ between two users • Deal with special cases • Spammers – follow random people • Celebrities – followed by random people • Edge weight gets abbreviated

  17. Defining Edge Weight • Consists of two components: • Trustworthiness (TW) • Mutual Friends (MF)

  18. Trustworthiness • Fraction of friends which have the same label as the user himself • Intuition: A person who has stayed at the same place all his life will have most friends from same location and hence high trustworthiness Location : Seattle/WA/USA Location : Seattle/WA/USA Location : Seattle/WA/USA Trustworthiness: 0.6 Friend Location:Seattle/WA/USA Location : Seattle/WA/USA Location : Seattle/WA/USA Location : Seattle/WA/USA

  19. Mutual Friends • Chose number common friends for similarity • Better Accuracy • Low Time Complexity

  20. Defining Edge Weight • Defined as Weightij=α×Max{TW(Ui), TW(Uj)} + (1- α) × MFij • 0<α<1, typically chosen to be around 0.7

  21. Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work

  22. Tweethood: Fuzzy k-Closest Friends with Variable Depth • Choose k “closest” friends for the user • If location is not found look further for the answer • Each node is defined by a vector having locations with their respective probabilities • Boost and Aggregate at each step Satyen Abrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010

  23. Find the location of John Doe

  24. Social Network of John Doe CB1 CB2 CB3 CBn

  25. Choose k closest friends of John Doe CB1 CB2 CB3 CBk

  26. Identify Locations Location : NULL CB1 LOW ACCURACY Location : Seattle, USA CB2 CB3 Location : NULL CBk Location : NULL

  27. What if we have depth=2 ? Location : Seattle/WA/USA Location : NULL Location : NULL Location : Dallas/TX/USA Location : NULL Location : Sydney/AU CB1 Location : Dallas/TX/USA CB2 Location : NULL Location : Richardson/TX/USA CB3 Location : NULL CBk

  28. Location Vector for John Doe’s friends Dallas/TX/USA 0.4 Seattle/WA/USA 0.2 Richardson/TX/USA 0.2 Sydney/AU 0.2 CB1 Dallas/TX/USA 0.33 New Delhi/Delhi/India 0.33 Sunnyvale/CA/USA 0.33 CB2 CB3 Austin/TX/USA 0.50 Minneapolis/MN/USA 0.50 CBk Plano/TX/USA 0.25 Boulder/CO/USA 0.25 Salt Lake City/UT/USA 0.25 London/London/GB 0.25

  29. Location Vector for John Doe Dallas/TX/USA 0.1825 Seattle/WA/USA 0.05 Richardson/TX/USA 0.05 Sydney/AU 0.05 New Delhi/Delhi/IN 0.0825 Sunnyvale/CA/USA 0.0825 Austin/TX/USA 0.125 Minneapolis/MN/USA 0.125 Plano/TX/USA 0.0625 Boulder/CO/USA 0.0625 Salt Lake City/UT/US 0.0625 London/GB 0.0625

  30. Agglomerative Clustering Dallas/TX/USA 0.1825 Seattle/WA/USA 0.05 Richardson/TX/USA 0.05 Sydney/AU 0.05 New Delhi/Delhi/IN 0.0825 Sunnyvale/CA/USA 0.0825 Austin/TX/USA 0.125 Minneapolis/MN/USA 0.125 Plano/TX/USA 0.0625 Boulder/CO/USA 0.0625 Salt Lake City/UT/US 0.0625 London/GB 0.0625

  31. Agglomerative Clustering {Dallas, Plano, Richardson}/TX/USA 0.295 Seattle/WA/USA 0.05 Sydney/AU 0.05 New Delhi/Delhi/IN 0.0825 Sunnyvale/CA/USA 0.0825 Austin/TX/USA 0.125 Minneapolis/MN/USA 0.125 Boulder/CO/USA 0.0625 Salt Lake City/UT/US 0.0625 London/GB 0.0625

  32. Tweethood: Algorithm

  33. Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work

  34. Tweecalization: Label Propagation • But the availability of users with location is limited • Most of users do not have a location • Need a method that can learn from unlabeled data Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania

  35. Tweecalization: Label Propagation • Ideal scenario for semi supervised learning: Only a few friends with locations(labeled data)1 • Use both labeled and unlabeled data for training • Points which are close to each other are more likely to share a label Y. Bengio, O. Dellalleau, and N. L. Roux, “Label propagation and quadratic criterion,” In O. Chapelle, B. Schlkopf and A. Zien (Eds.), Semi-supervised learning. MIT Press, 2006.

  36. Label Propagation: An Illustration “CLAMPED LOCATIONS” Central User Friends with location Friends without location ?

  37. Tweecalization: Algorithm

  38. Outline • Introduction and Problem Statement • Different Approaches • Social Graph Based: Our Approaches • Tweethood: Fuzzy k – Closest Friends with Variable Depth • Tweecalization: Label Propagation • Tweeque: Graph Partitioning for Spatio-Temporal Analysis • Experiments and Results • Future Work

  39. What About Temporal Analysis? • None of the existing works do temporal analysis • What about migration/ geographical mobility?

  40. Migration/Geographical Mobility • 4% to 6% every year, means 12 to 17 million each year United States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/

  41. Migration/Geographical Mobility • Migration as a function of age • People aged 20-29 have a higher probability to move High Migration Rate: College and Jobs Low Migration Rate: Old age, people settle down United States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/

  42. Facebook Users and Mobility • Let us look at the cumulative effect • Only 28% to 37% are currently living in their hometown Based on our experiments on 300k Public Facebook Profiles

  43. Twitter Users and Mobility • Linking Twitter users to migration • 33% of all Twitter users are aged 25-34 years Based on our findings by [1] ABI Research. Online. Available: http://www.abiresearch.com

  44. Tweeque: Graph Partitioning • How do we know if “this” is the current location for a user? • How do we perform temporal analysis of friendships? • Propose a technique that indirectly infers the current location SatyenAbrol, Latifur Khan and BhavaniThuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.

  45. Observation 1: Social Cliques and Location • Our definition: A social clique is an inclusive group of people that share friendship • Apart from friendship, what is the attribute that links members of a clique? Individual Locations • All members of a clique were or are at a particular geographical location at a particular instant of time like college, school, a company, etc.

  46. Observation 2: Migration and Time • As shown previously over course of time, people have tendency to migrate • Based on these two observations we hypothesize • If we can divide the social graph of a particular user into cliques and check for location based purity of the cliques, we can accurately separate out his current location from previous locations. • Migration is our latent time factor

  47. Tweeque: An example Friends from high school in Dallas Friends from college in Boston Relatives/Cousins Friends from job in Seattle

  48. Tweeque: An example All Friends of the User

  49. Tweeque: An example Social Clique #1 (High School) Social Clique #2 (College) Social Clique #3 (Current Work) Social Clique #4 (Relatives)

  50. Tweeque: An Example Relatives High School College Work Singapore Seattle/WA/USA Boston/MA/USA Dallas/TX/USA Seattle/WA/USA Sydney/Australia Portland/OR/USA Seattle/WA/USA Dallas/TX/USA Dallas/TX/USA Dallas/TX/USA Austin/TX/USA Dallas/TX/USA Seattle/WA/USA Boston/MA/USA San Diego/CA/USA Ontario/Canada Redmond/WA/USA Dallas/TX/USA New York/NY/USA Purity (Dallas) = 0.32 Purity (Boston) = 0.45 Purity (Dallas) = 0.18 Purity (Seattle) = 0.69

More Related