240 likes | 319 Views
Partitioning Social Networks for Time-dependent Queries. Berenice Carrasco , Yi Lu and Joana M. F. da Trindade - University of Illinois -. EuroSys11 – Workshop on Social Network Systems. My colleague’s facebook home page!. My colleague’s facebook home page!.
E N D
Partitioning Social Networks for Time-dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. daTrindade - University of Illinois - EuroSys11 – Workshop on Social Network Systems
My colleague’s facebook home page! • What is visible to Joana? • Messages in a two-hop network Joana Adarsh Nandana Naseer Jona
Why is partitioningimportant? • Different types of queries in Social Networks • photo tags, marketplace, news feed • Retrieve small records (personalized content) • Multiple records from different users • Time-dependent • Home page refresh at Facebook Most common query
Existing approaches • Partition based on friendship solely (1-hop network) • Power-law degree distribution • Highly interconnected data • Small fraction of nodes with very large degrees • General approach: Horizontal partitioning + Replication
Existing approaches • Hash-based horizontal partitioning • Multiple records in different servers • Bad response time • Inefficient network usage • High packet overhead for such small data Joana Adarsh Key: User name Nandana Joana Nandana Naseer Jona Adarsh Jona Naseer p1 p2 p3
Existing approaches • Replication • Great amount of extra storage
Existing approaches • Query-based partitioning • Assume queries do not change with time Curino et. al., “SCHISM: A workload-driven approach to database replication and partititioning”, 2010
The challenge for Social Networks • Friendship or query-based do not work well • Underlying network varies over time • Added/deleted friends • Interaction level changes Only30% of Facebook user pairs interact consistently from one month to the next
Our approach • Partitioning not only the friendship network but also along the time dimension • Interaction: activity network • weighted links: strong vs. weak • power-law with much lighter tail • Maximal degree around 100 • This partitioning results in: • Fewer cross-edges • Reduced need for replication • Goal: Provide frequent users with high data locality • Faster response to queries
Our algorithm • Differentiate between: 1) period used for prediction and 2) current period to partition • Look at the interaction and predict the strength of relationship • Then, look at this strength and determine what data can be accessed together Identifies links from past traces and capture relationships with strong activity Assign a cost that will determine how costly it would be to cut one edge or another
Our algorithm • We propose a way to compute weights in this APG • User nodes • Message nodes • Two-hop network
Our algorithm • We propose a way to compute weights in this APG • Message node weights • User node weights • Decay factor • # msg exchanged
Our algorithm • Cost of local partitions • Message node weights • User node weights • Edge weights • Msg accessible to user X • Remote msgweights Partition 1 Partition 2
Evaluation: Graph Partitioning • Data set: • Facebook New Orleans network • Jan2005 to Dec2006 • 8643 users and 69836 wall posts • APG: Jan2005 to Nov2006 • Fixed period: Dec-2006, with 13948 wall posts
Evaluation of Data Locality • We mimic real Facebook page downloads for all wall posts in Dec2006 • Query requests 6 most recent wall posts in the user’s two-hop network • We compare our algorithm to two hashed-based horizontal partitioning algorithms • Hash_p1 • Hash_p1_p2 • Number of partitions used: up to 20
Evaluation of Data Locality • Proportion of queries that access only 1 partition
Evaluation of Data Locality • Proportion of queries that access at most 3 partitions
Conclusion and Future Work • Our algorithm partitions social network data according to interaction levels at different times • Our activity prediction graph significantly improved data locality compared to hashing • Placement of data across different periods
Existing approaches • Hash-based horizontal partitioning
Our approach • Replication with time-dependency
Our approach • Replication with time-dependency
Greedy Algorithm • Use an algorithm for messages corresponding to the non-predicted month: Dec2006 • Initiator and receiver of the message exist in the APG but no previous interaction • Exactly one of the initiator and receiver of the message exist in the APG • Neither the initiator nor the receiver exists in the APG