Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services Ke Wang, AbhishekKulkarni, Michael Lang, Dorian Arnold, IoanRaicu USRC @ Los Alamos National Laboratory Datasys @ Illinois Institute of Technology CS @ Indiana University CS @ University of New Mexico November 20th, 2013 at IEEE/ACM Supercomputing/SC 2013

Current HPC System Services 2 Extreme scale Lack of decomposition for insight Many services have centralized designs Impacts of service architectures  an open question Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Long Term Goals • Modular components design for composable services • Explore the design space for HPC services • Evaluate the impacts of different design choices Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Contribution • A taxonomy for classifying HPC system services • A simulation tool to explore Distributed Key-Value Stores (KVS) design choices for large-scale system services • An evaluation of KVS design choices for extreme-scale systems using both synthetic and real workload traces Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Outline • Introduction & Motivation • Key-Value Store Taxonomy • Key-Value Store Simulation • Evaluation • Conclusions& Future Work Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Distributed System Services • Job Launch, Resource Management Systems • System Monitoring • I/O Forwarding, File Systems • Function Call Shipping • Key-Value Stores Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Key IssuesDistributed System Services • Scalability • Dynamicity • Fault Tolerance • Consistency Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Key-Value Stores and HPC • Large volume of data and state information • Distributed NoSQL data stores used as building blocks • Examples: • Resource management (job, node status info) • Monitoring (system active logs) • File systems (metadata) • SLURM++, MATRIX [1], FusionFS [2] [1] K. Wang, I. Raicu. “Paving the Road to exascale through Many Task Computing”, Doctor Showcase, IEEE/ACM Supercomputing 2012 (SC12) [2] D. Zhao, I. Raicu. “Distributed File Systems for Exascale Computing”, Doctor Showcase, IEEE/ACM Supercomputing 2012 (SC12) Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

HPC KVS TaxonomyWhy? • Decomposition • Categorization • Suggestion • Implication Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

HPC KVS TaxonomyComponent • Service model: functionality • Data model: distribution and management of data • Network model: dictates how the components are connected • Recovery model: how to deal with component failures • Consistency model: how rapidly data modifications propagate Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Centralized Architectures Data model: centralized Network model: aggregation tree Recovery model: fail-over Consistency model: strong Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Distributed Architectures Data Model: distributed with partition Network Model: fully-connected partial knowledge Recovery Model: consecutive replicas Consistency Model: strong, eventual Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

KVS Simulation Design • Discrete Event Simulation  PeerSim • Evaluated others: OMNET++, OverSim, SimPy • Configurable number of servers and clients • Different architectures • Two parallel queues in a server • Communication queue (send/receive requests) • Processing queue (process request locally) Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Simulation Cost Model The time to resolve a query locally (tLR), and the time to resolve a remote query (tRR) is given by: tLR= CS + SR + LP + SS + CR For fully connected: tRR= tLR+ 2 × (SS + SR) For partially connected: tRR= tLR+ 2k× (SS + SR) where k is the number of hops to find the predecessor Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Failure/Recovery Model • Defines what to do when a node fails • How a node-state recovers when rejoining after failure EM EM client client X X notify back notify failure s0 r5,1 r4,2 s0 r5,1 r4,2 s1 r0,1 r5,2 s1 r0,1 r5,2 s5 r4,1 r3,2 s5 r4,1 r3,2 s0, s4, s5 data first replica down s0 is back replicate my data remove s5data second replica down s4 r3,1 r2,2 s4 r3,1 r2,2 s0 is back s2 r1,1 r0,2 s2 r1,1 r0,2 replicate my data replicate s0 data remove s0 data s3 r2,1 r1,2 s3 r2,1 r1,2 Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Consistency Model • Strong Consistency • Every replica observes every update in the same order • Client sends requests to a dedicated server (primary replica) • Eventual Consistency • Requests are sent to randomly chosen replica (coordinator) • Three key parameters: N, R, W, satisfying R + W > N • Use Dynamo [G. Decandia, 2007] version clock to track different versions of data and detect conflicts Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Evaluation • Evaluate the overheads • Different architectures, focus on distributed ones • Different models • Light-weight simulations: • Largest experiments  25GB RAM, 40 min walltime • Workloads • Synthetic workload with 64-bit key space • Real workload traces from 3 representative system services: job launch, system monitoring, and I/O forwarding Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Validation • Validate against ZHT [1] (left) and Voldemort (right) • ZHT  BG/P up to 8K nodes (32K cores) • Voldemort  PROBE Kodiak Cluster up to 800 nodes [1] T. Li, X. Zhou, K. Brandstatter, D. Zhao, K. Wang, A. Rajendran, Z. Zhang, I. Raicu. “ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table”, IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2013 Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Fully-connected vsPartial-connected • Partial connectivity  higher latency due to the additional routing • Fully-connected topology  faster response (twice as fast at extreme scale) Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Replication Overhead • Adding replicas always involve overheads • Replicas have larger impact on fully connected than on partially connected Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Failure Effect • Higher failure frequency introduces more overhead, but the dominating factor is the client request processing messages Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Combined Overhead • Eventual consistency has more overhead than the strong consistency Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Real Workloads • For job launch and I/O forwarding • Eventual consistency performs worse  almost URD for both request type and the key • Monitoring • Eventual consistency works better  all requests are “put” Fully connected Partially connected Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Simulation  Real Services • ZHT (distributed key/value storage) • DKVS implementation • MATRIX (runtime system) • DKVS is used to keep task meta-data • SLURM++ (job management system) • DKVS is used to store task & resource information • FusionFS (distributed file system) • DKVS is used to maintain file/directory meta-data Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Conclusions • Key-value Store is building block • Service taxonomy is important • Simulation framework to study services • Distributed architecture is demanded • Replication adds overhead • Fully-connected topology is good • As long as the request processing message dominates • Consistency tradeoffs Read-Intensity/Performance Weak Consistency Strong Consistency Eventual Consistency Write-Intensity/Availability Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Future Work • Extend the simulator to cover more of the taxonomy • Explore other recovery models • log-based • information dispersal algorithm • Explore other consistency models • Explore using DKVS in the development of: • General building block library • Distributed monitoring system service • Distributed message queue system Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Acknowledgement • DOE contract: DE-FC02-06ER25750 • Part of NSF award: CNS-1042543 (PRObE) • Collaboration with FusionFSproject under NSF grant: NSF-1054974 • BG/P resource from ANL • Thanks to Tonglin Li, Dongfang Zhao, HakanAkkan Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

More Information • More information: • http://datasys.cs.iit.edu/~kewang/ • Contact: • kwang22@hawk.iit.edu • Questions? Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Related Work • Service Simulation • Peer-to-peer networks simulation • Telephony simulations • Simulation of consistency • Problem: not focus on HPC, or combine distributed features • Taxonomy • Investigation of distributed hash tables, and an algorithm taxonomy • Grid computing workflows taxonomy • Problems: none of them drive features in a simulation Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services

Presentation Transcript

Extreme Value Techniques

Simulation at Extreme Scale

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

Leveraging In-Memory Key Value Stores for Large Scale Operations with Redis and CFEngine

Distributed System Services

Data Scaling and Key-Value Stores

Key/Value Stores

Distributed Load Balancing for Key-Value Storage Systems

DCS 3. Key-value Stores and NoSQL

LogKV : Exploiting Key-Value Stores for Event Log Processing

Smartphones as distributed system with extreme heterogeneity

Architectures for Extreme-Scale Computing

Extreme scale parallel and distributed systems

Large scale networked system simulation using MLDesigner

Some key-value stores using log-structure

Smartphones as distributed system with extreme heterogeneity

Key-Value stores

Distributed System Services

Distributed Simulation

A Web-based Distributed Simulation System

Distributed System Services

Distributed System Services