Partitioning Social Networks for Time-dependent Queries

Partitioning Social Networks for Time-dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. daTrindade - University of Illinois - EuroSys11 – Workshop on Social Network Systems

My colleague’s facebook home page!

My colleague’s facebook home page! • What is visible to Joana? • Messages in a two-hop network Joana Adarsh Nandana Naseer Jona

Why is partitioningimportant? • Different types of queries in Social Networks • photo tags, marketplace, news feed • Retrieve small records (personalized content) • Multiple records from different users • Time-dependent • Home page refresh at Facebook Most common query

Existing approaches • Partition based on friendship solely (1-hop network) • Power-law degree distribution • Highly interconnected data • Small fraction of nodes with very large degrees • General approach: Horizontal partitioning + Replication

Existing approaches • Hash-based horizontal partitioning • Multiple records in different servers • Bad response time • Inefficient network usage • High packet overhead for such small data Joana Adarsh Key: User name Nandana Joana Nandana Naseer Jona Adarsh Jona Naseer p1 p2 p3

Existing approaches • Replication • Great amount of extra storage

Existing approaches • Query-based partitioning • Assume queries do not change with time Curino et. al., “SCHISM: A workload-driven approach to database replication and partititioning”, 2010

The challenge for Social Networks • Friendship or query-based do not work well • Underlying network varies over time • Added/deleted friends • Interaction level changes Only30% of Facebook user pairs interact consistently from one month to the next

Our approach • Partitioning not only the friendship network but also along the time dimension • Interaction: activity network • weighted links: strong vs. weak • power-law with much lighter tail • Maximal degree around 100 • This partitioning results in: • Fewer cross-edges • Reduced need for replication • Goal: Provide frequent users with high data locality • Faster response to queries

Our algorithm • Differentiate between: 1) period used for prediction and 2) current period to partition • Look at the interaction and predict the strength of relationship • Then, look at this strength and determine what data can be accessed together Identifies links from past traces and capture relationships with strong activity Assign a cost that will determine how costly it would be to cut one edge or another

Our algorithm • We propose a way to compute weights in this APG • User nodes • Message nodes • Two-hop network

Our algorithm • We propose a way to compute weights in this APG • Message node weights • User node weights • Decay factor • # msg exchanged

Our algorithm • Cost of local partitions • Message node weights • User node weights • Edge weights • Msg accessible to user X • Remote msgweights Partition 1 Partition 2

Evaluation: Graph Partitioning • Data set: • Facebook New Orleans network • Jan2005 to Dec2006 • 8643 users and 69836 wall posts • APG: Jan2005 to Nov2006 • Fixed period: Dec-2006, with 13948 wall posts

Evaluation of Data Locality • We mimic real Facebook page downloads for all wall posts in Dec2006 • Query requests 6 most recent wall posts in the user’s two-hop network • We compare our algorithm to two hashed-based horizontal partitioning algorithms • Hash_p1 • Hash_p1_p2 • Number of partitions used: up to 20

Evaluation of Data Locality • Proportion of queries that access only 1 partition

Evaluation of Data Locality • Proportion of queries that access at most 3 partitions

Conclusion and Future Work • Our algorithm partitions social network data according to interaction levels at different times • Our activity prediction graph significantly improved data locality compared to hashing • Placement of data across different periods

Backup Slides

Existing approaches • Hash-based horizontal partitioning

Our approach • Replication with time-dependency

Greedy Algorithm • Use an algorithm for messages corresponding to the non-predicted month: Dec2006 • Initiator and receiver of the message exist in the APG but no previous interaction • Exactly one of the initiator and receiver of the message exist in the APG • Neither the initiator nor the receiver exists in the APG

Partitioning Social Networks for Time-dependent Queries

Partitioning Social Networks for Time-dependent Queries

Presentation Transcript

Time-dependent Similarity Measure of Queries Using Historical Click-through Data

Time Dependent Perturbation Theory

Time-Dependent Perturbation Theory

Time Dependent Properties

The time-dependent

Time dependent email profiles

V6 Time- dependent properties of Epidemic Networks

Time Dependent Deformations

Time and Social Networks

Time Dependent Sensitivity

Community Structure In Time-Dependent, Multiscale, And Multiplex Networks

Geosensor Networks: Spatiotemporal Queries for Environmental Monitoring

Time dependent gene analysis

Time-Dependent Infinite-Server General Queueing Networks

Time-Dependent Photon Mapping

Time Dependent Data Summary

Partitioning Search-Engine Returned Citations for Proper-Noun Queries

P2P networks for distributed queries

Time-dependent fields

Time-dependent Similarity Measure of Queries Using Historical Click-through Data