130 likes | 272 Views
INDRA - A Distributed In-memory Cache for Online Social Networks. Long Kai Anjali Sridhar. Sreeram Kannan Siva Theja Maguluri. Motivation – “big multi-get”. Memcached In-memory distributed hash table service used in Facebook 400k connections to any Memcached server
E N D
INDRA - A Distributed In-memory Cache for Online Social Networks Long Kai Anjali Sridhar SreeramKannan Siva Theja Maguluri
Motivation – “big multi-get” • Memcached • In-memory distributed hash table service used in Facebook • 400k connections to any Memcached server • Estimated 5 GB memory is required to maintain TCP connections • Replace TCP with UDP • High Communication overhead
Related Work - SPAR • In the consistent storage(SPAR): • Scalability: on average 7 copies are stored • System flexibility: confined to multi-get applications of small data items • Algorithmic flexibility: inefficient dynamic adaptation of new usage pattern • Reliability: complicated failure recovery mechanism • Load-balancing J.M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine (s) that could: Scaling online social networks. InACM SIGCOMM Computer Communication Review ,volume 40, pages 375–386. ACM, 2010.
Indra Guarantees: Most recent copy is present in the primary. Secondary copies are eventually consistent. Design principles: Reliance on the consistent storage for failure recovery. Idempotent operations only CLIENT INDRA SERVER (MEMORY-CACHE) CONSISTENT STORAGE
Advantages • Modularity: Data reliability is decoupled from the partition and replication algorithm module. • Flexibility: Use of caching and eviction at individual servers to dynamically adapt to new usage patterns.
Algorithm • Indra objectives • To place friends’ data together • To replicate popular data items • Based on access log • Weighted graph; weights denote joint access frequency e a d c c d Placement Replication b f
Mathematical Model Balance the load among servers Minimize replicas Collocate user with friends Collocation Gain Replication Cost Server Load Cost Maximize - - • Placement plans • Replication plans Among all Problem incorporates several NP-Hardproblems!
Problem Decomposition Key Idea: Separate the problem into two simpler problems! Collocation Gain Collocation Gain Server Load Cost Replication Cost Partitioning Problem Maximize Maximize - - Placement plans Replication plans Among all Among all Replication Problem
Online Algorithm • Current state of system • User arrives • Compare the two possible placements • Assigned to server 1 • Replicated at server 2 • User arrives • Placement trades off collocation gain and load balancing cost e a d c g g c d b f g h
Evaluation • Data Set • Random Walk on Facebook Data Set from Max Planck Institute for Software Systems • 6373 vertices and 183,734 edges; average 28.83 neighbors • Metrics • Number of Connections • Number of TCP packets • Experimental Setup • 10 / 15 Servers, • Consistent Storage interface, • Offline algorithm • 1000 read requests • Tcpdumpand Wiresharkfor bandwidth analysis
Bandwidth : Indra Vs Random Replication Factor: 10 Servers: 1.2 15 Servers :1.8
Contributions • Proposed In-Memory Distributed Cache • Takes advantage of data access relationships • Retrieves small data items in Online Social Networks • Uses Dynamic Partition and Replication Algorithm • Results show • Factor of 4 decrease in the number of TCP packets • Can trade off Replication for number of connections Thanks!