310 likes | 483 Views
Multiple Location Profiling for Users and Relationships from Social Network and Content. Rui Li, Shengjie Wang, Kevin Chen- Chuan Chang University of Illinois at Urbana-Champaign. Users’ Locations are important for many information services. Local Content Recommendation. Content Provider.
E N D
Multiple Location Profiling for Users and Relationshipsfrom Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign
Users’ Locations are important for many information services Local Content Recommendation Content Provider User Social Network Local Friends Recommendation Carol Lives in: Los Angeles and many others.
Jean Mike Bob Carol San Diego Community has explored social network and content to profile users’ locations. ? LA ? Lucy Gaga Terrible LA traffic! Austin Want to go to Honolulu for Spring vacation! NY See Gaga in Hollywood. Good Morning! Tweets Social Network Profiling a User’s Home Location Location: Los Angeles
Problem 1 They only profile a single home location. • Carol lives Los Angeles and studied at Uni. of Texas at Austin Locations of a user’s friends Tweeted Locational Words • incomplete • inaccurate
Problem 2 They totally miss profiling relationships. both Carol and Bob work at Los Angeles both Carol and Lucy studied at Austin Carol lives Los Angeles • useful !
Jean Mike Bob Carol San Diego We focus on multiple location profiling for users and relationships. ? LA ? Lucy Gaga Terrible LA traffic! Austin Want to go to Honolulu for Spring vacation! NY See Gaga in Hollywood. Good Morning! Carol in Real-world Location: Los AngelesEducation: Uni. of Texas at Austin Carol’s Location Profile:Los Angeles, Austin Carol follows Lucy:Austin, Austin
Our approach is to build a model to connect known relationships with unknown locations. Known Relationships Unknown Locations MLP Model Generation Model Inference Algorithm
There are three challenges for building MLP. • Challenge 1How to connect users’ locations with relationships? • from users’ locations to following relationships • from users’ locations to tweeting relationships • Challenge 2 How to model that the relationships are mixed? • some relationships are not based on locations. • each relationship is based on a different location. • Challenge 3 How to utilize home locations from labeled users?
Challenge 1.A We need to connect following relationships with two users’ locations. Even a user has only one location follows others from different locations. The following probability as the probability generating a following relationship from a user to another user based on their locations
Observation We explore following probability via investigating a corpus • It captures our intuition well. • It fits a power law distribution.
Solution: We derive location-based following model for following probability. The location-based following model
Challenge 1.B We need to connect tweeting relationships with a user’s location. User at a location tweets different locations. The tweeting probability as the probability generating a tweeting relationship from a user to a venue based on a location
Observation We explore tweeting probability via investigating a corpus. • They capture our intuition well. • They can be modeled as a set of multinomial distributions.
Solution: We derive location-based tweeting model for tweeting probability. The location-based tweeting model
Challenge 2.A There are both noisy and location-based relationships. Noisy relationships are not useful!
Solution: We propose a mixture componentfor two types of relationships. • A relationship is generated based on either a location-based model or a random model. • A binary model selector μ indicates which model is used. • The selector is generated via a binomial distribution
Challenge 2.B Location-based relationships are related to multiple locations. both Carol and Lucy studied at Austin Carol lives Los Angeles Accurate! Complete!
Solution: We fundamentally model users multiple locations in generating relationships. Location profile as a multinomial distribution over locations. Carol {Los Angels 0.1, Austin 0.1, … } Each relationship is based on one particular location from his profile.
Jean Mike Bob Carol San Diego Challenge 3 We should utilize observed locations from some users’ profiles. ? LA ? Lucy Gaga Austin NY 20% users provide their home locations in their profiles. • they are useful for profiling locations! • we cannot use them directly to generate relationships!
Solution: We utilize observed locations from as priors to generate users’ profiles. We assume users profiles are generated prior distributions. Home locations of users are likely to be generated. Bob {San Diego 0.9, Los Angels 0.05, …}
We evaluate our model on a large Twitter corpus. • We crawled a subset of Twitter. • There are 139K users, 50 million tweets and 2 million following relationships.
Task 1 profiling users’ home locations, MLP performs accurately and improves baselines.
Task 2 profiling users’ multiple locations, MLP proforms accurately and completely. Accurately Completely Precision and Recall at Rank 2 Locations in a similar region Locations in different areas Case Studies
Task 3 profiling following relationships, MLP achieves 57% accuracy.
Experiments 1 • We use the home location provided in users’ profiles as ground truth. • We compare two baseline methods proposed in literature.
Experiments 2 • We manually labeled multiple locations of 1000 users, and obtained 585 users, who clearly have multiple locations. • We compare the same baseline methods as in the previous task. • We measure the performance in terms of “precision” and “recall”.
Experiments 3 • We manually labeled location assignments of 585 users, whose multiple locations are known to us, and obtained 4426 relationships. • We design a meaningful baseline method, which profile a relationship based users home locations.
We infer users’ locations and location assignments for relationships as latent variable in the joint probability. MLP defines the joint probability of observations, parameters, and latent variables. We infer users’ locations and locations assignments with the observed relationships and the given parameters. We develop our algorithm based on the Gibbs sampling method.