250 likes | 454 Views
Dynamo: Amazon’s Highly Available Key-value Store. Professor : Dr Sheykh Esmaili. Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani. Introduction.
E N D
Dynamo: Amazon’s Highly Available Key-value Store Professor : Dr SheykhEsmaili Presenters: PouryaAliabadi BoshraArdallani PariaRakhshani
Introduction • Amazon runs a world-wide e-commerce platform that serves tens of millions customers at peak times using tens of thousands of servers located in many data centers around the world • Reliabilityat massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust
Introduction • One of the lessons our organization has learned from operating Amazon’s platform is that the reliability and scalability of a system is dependent on how its application state is managed • To meet the reliability and scaling needs, Amazon has developed a number of storage technologies, of which the Amazon Simple Storage Service (S3) • There are many services on Amazon’s platform that only need primary-keyaccess to a data store
System Assumptions and Requirements • Query Model • Operations to a data item that is uniquely identified by a key • State is stored as binary objects • No operations span multiple data items • Dynamo targets applications that need to store objects that are relatively small (less than 1 MB)
System Assumptions and Requirements • ACID Properties • ACID (Atomicity, Consistency, Isolation, Durability) • ACID is a set of properties that guarantee that database transactions are processed reliably • Dynamo targets applications that operate with weaker consistency • Dynamo does not provide any isolation guarantees and permits only single key updates
System Assumptions and Requirements • Efficiency • The system needs to function on a commodity hardware infrastructure • Services must be able to configure Dynamo such that they consistently achieve their latency and throughput requirements. • The tradeoffs are in performance, cost efficiency, availability, and durability guarantees.
System Assumptions and Requirements • Dynamo is used only by Amazon’s internal services • We will discuss the scalability limitations of Dynamo and possible scalability related extensions
Service Level Agreements (SLA) • To guarantee that the application can deliver its functionality in a bounded time, each and every dependency in the platform needs to deliver its functionality with even tighter bounds • An example of a simple SLA is a service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second • For example a page request to one of the e-commerce sites typically requires the rendering engine to construct its response by sending requests to over 150 services • These services often have multiple dependencies
Figure shows an abstract view of the architecture of Amazon’s platform
DesignConsiderations • Incremental scalability: Dynamo should be able to scale out one storage host (henceforth, referred to as “node”) at a time, with minimal impact on both operators of the system and the system itself • Symmetry: Every node in Dynamo should have the same set of responsibilities as its peers; there should be no distinguished node or nodes that take special roles or extra set of responsibilities
DesignConsiderations • Decentralization: An extension of symmetry, the design should favor decentralized peer-to-peer techniques over centralized control. In the past, centralized control has resulted in outages and the goal is to avoid it as much as possible. This leads to a simpler, more scalable, and more available system. • Heterogeneity: The system needs to be able to exploit heterogeneity in the infrastructure it runs on. e.g. the work distribution must be proportional to the capabilities of the individual servers. This is essential in adding new nodes with higher capacity without having to upgrade all hosts at once.
System Architecture • The Dynamo data storage system contains items that are associated with a single key • Operations that are implemented: get( ) and put( ) • get(key): locates object with key and returns object or list of objects with a context • put(key, context, object): places an object at a replica along with the key and context • Context: metadata about object
Partitioning • Provides mechanism to dynamically partition the data over the set of nodes • Use consistent hashing • Similar to Chord • Each node gets an ID from the space of keys • Nodes are arranged in a ring • Data stored on the first node clockwise of the current placement of the data key
Virtual node • (single node) -> multiple points in the ring i.e. virtual nodes • Advantages of virtual nodes: • Graceful handling of failure of a node • Easy accommodation of a new node • Heterogeneity in physical infrastructure can be exploited
Replication • Each data item replicated at N hosts • N is configured per-instance • Each node is responsible for the region of the ring between it and its Nth predecessor • Preference list: List of nodes responsible for storing a particular key
VERSIONING • Multiple versions of an object can be present in the system at same time • Vector clock is used for version control • Vector clock size issue
Execution of get() and put() Operations • Operations can originate at any node in the system • Coordinator: • node handing read or write operation • The coordinator contacts R nodes for reading and W nodes for writing, where R + W > N
Handling Failures • Temporary failures: Hinted Handoff • Mechanism to ensure that the read and write operations are not failed due to temporary node or network failures. • Handling Permanent Failures: Replica Synchronization • Synchronize with another node • Use Merkle Trees
Membership and Failure Detection • Explicit mechanism available to initiate the addition and removal of nodes from a Dynamo ring • To prevent logical partitions, some Dynamo nodes play the role of seed nodes • Gossip-based distributed failure detection and membership protocol
Implementation Storage Node Request Coordination Membership & Failure Detection Local Persistence Engine • Pluggable Storage Engines • Berkeley Database (BDB) Transactional Data Store • BDB Java Edition • MySQL • In-memory buffer with persistent backing store • Chosen based on application’s object size distribution • Each state machine instance handles exactly one client request • State machine contains entire process and failure handling logic • Built on top of event-driven messaging substrate • Coordinator executes client read & write requests • State machines created on nodes serving requests
Experiences, Results & Lessons Learnt • Main Dynamo Usage Patterns • Business logic specific reconciliation • E.g. Merging different versions of a customer’s shopping cart • Timestamp based reconciliation • E.g. Maintaining customer’s session information • High performance read engine • E.g. Maintaining product catalog and promotional items • Client applications can tune parameters to achieve specific objectives: • N: Performance {no. of hosts a data item is replicated at} • R: Availability {min. no. of participating nodes in a successful read opr} • W: Durability {min. no. of participating nodes in a successful write opr} • Commonly used configuration (N,R,W) = (3,2,2)
Experiences, Results & Lessons Learnt • Balancing Performance and Durability Average & 99.9th percentile latencies of Dynamo’s read and write operations during a period of 30 days Comparison of performance of 99.9th percentile latencies for buffered vs. non-buffered writes over 24 hours
Conclusion Dynamo: • Is a highly available and scalable data store • Is used for storing state of a number of core services of Amazon.com’s e-commerce platform • Has provided desired levels of availability and performance and has been successful in handling: • Server failures • Data center failures • Network partitions • Is incrementally scalable • Sacrifices consistency under certain failure scenarios • Extensively uses object versioning • Combination of decentralized techniques can be combined to provide a single highly-available system.