INDRA - A Distributed In-memory Cache for Online Social Networks

INDRA - A Distributed In-memory Cache for Online Social Networks Long Kai Anjali Sridhar SreeramKannan Siva Theja Maguluri

Motivation – “big multi-get” • Memcached • In-memory distributed hash table service used in Facebook • 400k connections to any Memcached server • Estimated 5 GB memory is required to maintain TCP connections • Replace TCP with UDP • High Communication overhead

Related Work - SPAR • In the consistent storage(SPAR): • Scalability: on average 7 copies are stored • System flexibility: confined to multi-get applications of small data items • Algorithmic flexibility: inefficient dynamic adaptation of new usage pattern • Reliability: complicated failure recovery mechanism • Load-balancing J.M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine (s) that could: Scaling online social networks. InACM SIGCOMM Computer Communication Review ,volume 40, pages 375–386. ACM, 2010.

Indra Guarantees: Most recent copy is present in the primary. Secondary copies are eventually consistent. Design principles: Reliance on the consistent storage for failure recovery. Idempotent operations only CLIENT INDRA SERVER (MEMORY-CACHE) CONSISTENT STORAGE

Advantages • Modularity: Data reliability is decoupled from the partition and replication algorithm module. • Flexibility: Use of caching and eviction at individual servers to dynamically adapt to new usage patterns.

Algorithm • Indra objectives • To place friends’ data together • To replicate popular data items • Based on access log • Weighted graph; weights denote joint access frequency e a d c c d Placement Replication b f

Mathematical Model Balance the load among servers Minimize replicas Collocate user with friends Collocation Gain Replication Cost Server Load Cost Maximize - - • Placement plans • Replication plans Among all Problem incorporates several NP-Hardproblems!

Problem Decomposition Key Idea: Separate the problem into two simpler problems! Collocation Gain Collocation Gain Server Load Cost Replication Cost Partitioning Problem Maximize Maximize - - Placement plans Replication plans Among all Among all Replication Problem

Online Algorithm • Current state of system • User arrives • Compare the two possible placements • Assigned to server 1 • Replicated at server 2 • User arrives • Placement trades off collocation gain and load balancing cost e a d c g g c d b f g h

Evaluation • Data Set • Random Walk on Facebook Data Set from Max Planck Institute for Software Systems • 6373 vertices and 183,734 edges; average 28.83 neighbors • Metrics • Number of Connections • Number of TCP packets • Experimental Setup • 10 / 15 Servers, • Consistent Storage interface, • Offline algorithm • 1000 read requests • Tcpdumpand Wiresharkfor bandwidth analysis

Bandwidth : Indra Vs Random Replication Factor: 10 Servers: 1.2 15 Servers :1.8

Trade off between Replication and Connections

Contributions • Proposed In-Memory Distributed Cache • Takes advantage of data access relationships • Retrieves small data items in Online Social Networks • Uses Dynamic Partition and Replication Algorithm • Results show • Factor of 4 decrease in the number of TCP packets • Can trade off Replication for number of connections Thanks!

INDRA - A Distributed In-memory Cache for Online Social Networks

INDRA - A Distributed In-memory Cache for Online Social Networks

Presentation Transcript

Cache Memory

Cache memory

Cache Memory

Cache Memory

Cache Memory

Cache Memory

CACHE MEMORY

Distributed Transactional Memory for General Networks

Cache Memory

Cache Memory

Cache Memory

Cache Memory

Cache memory

Cache Memory

Cache Memory

Cache Memory

Cache Memory

Cache memory

Cache Memory

Distributed Memory and Cache Consistency

Cache Memory

Cache Memory