260 likes | 447 Views
Mining Social Network Big Data. Intelligent. Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu. Research Dimensions. Networks. Intelligent Pervasive Data Access. Data. Mobility. Research Agenda.
E N D
Mining Social Network Big Data Intelligent Wang-Chien Lee Pervasive Data Access (iPDA) Group Pennsylvania State University wlee@cse.psu.edu
Research Dimensions Networks Intelligent Pervasive Data Access Data Mobility Industry Day
Research Agenda Developing data management techniques for supporting complex services in networking and mobile environments Industry Day • Location-Based Services • Road/Transportation Networks • Sensor Data Management • Peer-to-Peer Data Management • Wireless Data Broadcast and Mobile Access • Social Networks
Big Data Landscape Industry Day
Social Media Industry Day
Location-based Social Networks Industry Day • Important Aspacts • Users (Social Network) • Places (Locations) • Who visits Wherein form of check-in & trajectory logs
LBSN App.’s & Research Opp.’s Industry Day • LBSN users can track & share their locations and relevant info. • Collective social intelligence can be leveraged from user-generated location data to enable novel applications. • LBSN Applications • Suggesting the best restaurants, finding popular hiking routes, or forming a biking community. • Recommendation services for location, activity, trip planning, friends, etc. • Research opportunities • Techniques for LBSN Apps, social network analysis, user profiling, data management and mining, pervasive computing, etc, are urgently needed.
Point-of-Interest Recommendation Industry Day • POI Recommendation • Helps a user to explore new POIs • Good for local business to gain customers • Where to have dinner tonight? • Requirements • Interests, e.g., Seafood • Geo-proximity, e.g,, not too far away • Real-time, i.e., time is money
Collaborative Filtering Industry Day • Treating POI as items • The idea is that users’ preference can be deduced by other users who exhibit similar visiting behaviors to POIs in previous check-in activities • Key issue is to find similar users and similar places/POIs effectively and efficiently.
Social & Geo Influences Industry Day • POI recommendation in LBSN is more than a problem of item recommendation • Social Network • People may turn to friends for suggestion • Geographical Proximity • Tobler’sFirst law of geography “Everything is related to everything else, but near things are more related than distant things” • People may go to places near • home or office • favored places
Our approach Check in POI Recommendation System User preference Social Influence Geo Influence DB Industry Day • Incorporate the following three factors: • User preference • Social Influence from friends who has a role on user activities. • Geographical influence existing in user activities.
User Preference Users with similar preference Industry Day • Recommendation based on user preference • i.e., Pure collaborative filtering (CF) approach • User-POI matrix
Social Influence user2 user1 user3 user4 user5 Industry Day • Recommendation based on Social influence • Social influenced CF approach • Similarity function considers both the strength of social tie and check-in similarity … • Friend-POI matrix
Social Influence Selection Model Industry Day • User u picks a friend (f) which includes herself (i.e., f=u). • Social influence. • User f generates a latent topic z. • User preference. • Latent topic z generates item i and a descriptive word w.
Geographical Influence Power law Let p1 and p2 denote two POIs, and d(p1,p2) be their distance, the probability is denoted by Pr[d(p1,p2)] How likely are two of a user’s check-in POIs in a given distance? Industry Day Phenomenon of spatial clustering in user’s check-ins
Geographical Influence q1 User i Which POI is the best candidate to explore? q2 q3 p5 p3 p1 p4 p2 Pr[q1|Pi] = ? Pr[q2|Pi] = ? Pr[q3|Pi] = ? User I’s check-in history Pi={p1,p2…} Industry Day • Exploiting Geographical Influence for Recommendation
Fusion Framework q1 (Su) q2 Fusion User’s own preference q1 (S) q3 q3 q2 q2 (Ss) q1 Social influence q3 q3 (Sg) q1 Geographical influence q2 Industry Day
Semantic Annotation of Places Tags are very useful! Tags are missing Places missing tags Places with tags Tags can support: Location search Recommendation service Data cleaning … The above shows statistics summarized from our dataset collected from Whrrl. Statistics in our Foursquare dataset is similar. Industry Day
Problem Description Industry Day • Given a database of user check-in logs <who, where, when> where some places are tagged, infer tags for the rest of places • i.e., places with question mark in the above figure • How to automatically label appropriate tags on places is a very challenging issue! • Our approach is to reduce the place semantic annotation problem into a classification problem.
The SAP Framework Classification Process: Check-in logs Binary classifier for tag t1 Decision for t1 Feature Extraction Component Binary classifier for tag t2 Decision for t2 Place Binary classifier for tag tm Decision for tm check-in logs Industry Day • How to learn the classifier for a tag (or tag type)? • Feature extraction is very important • Features explicitly describing places • Features implicitly correlating similar places (i.e., places with same/similar tags) • Feature source?
Explicit Patterns (EP) Extraction Industry Day What are the explicit patterns associated with individual places?
Implicit Relatedness (IR) Extraction Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 00:00 Bars Bars ? Gym Health Beauty Restaurant Restaurant Restaurant Restaurant Restaurant Restaurant Restaurant Spa Shopping Shopping Restaurant Shopping Restaurant Restaurant 23:59 Bars Check-in log of a user. Industry Day • Are places really correlated? • If yes, how do we extract the IR between places? • Places checked in by the user at around the same time are probably in the same category
Network of Related Places (NRP) • Build an NRP by exploring the regularities in users-places and time-places interactions. Random Walk with Restart Places Users Relatedness between places Network of Related Places (NRP) Times Places Industry Day
Label Propagation on NRP IR features: Tag 1 – score1 Tag 2 – score2 …. Tag k – scorek restaurant ? restaurant Restaurant 0.66 Shopping 0.34 shopping Label propagation restaurant restaurant restaurant shopping restaurant restaurant restaurant shopping Industry Day
Conclusion Industry Day • LBSNs have received a lot of attention from the research community • LBSN data have rich social and location information. • Novel applications can be developed from the rich user-generated data in LBSNs. • We have incorporated social and geo influences with collaborative filtering technique for POI recommendation. • To address the semantic annotation problem in LBSNs, we extract explicit pattern (EP) of individual places and implicit relatedness (IR) among places to classify the missing tags. • New applications and more research are forth coming.