130 likes | 344 Views
Dynamo: Amazon's Highly Available Key-Value Store. Offense: Jori and Ning. Outline. Presentation (Ning) Symmetry (Jori) WAN considerations (Ning) Consistency (Jori) Disaster Recovery (Ning) Minor Quibbles (Jori, Ning). Presentation (Ning). Dynamo: The basic functions are simple;
E N D
Dynamo: Amazon's Highly Available Key-Value Store Offense: Jori and Ning
Outline • Presentation (Ning) • Symmetry (Jori) • WAN considerations (Ning) • Consistency (Jori) • Disaster Recovery (Ning) • Minor Quibbles (Jori, Ning)
Presentation (Ning) • Dynamo: • The basic functions are simple; • System implementation could be very complex; • Leads to many gaps in the explanation. Missing things that are mentioned, but not explained include: • overload handling • state transfer • concurrency • job scheduling • request marshalling • request routing • system monitoring • alarming • configuration management • If you don't want to talk about them, don't mention them.
Presentation contd. • Almost impossible to understand some concepts without reading the cited material. • Some concepts are used but not well explained: • the gossip protocol • vector clock • Some concepts are not so important: SLA • Too wordy: at least give a numbered list • No clear graph: please use flow chart!! • Despite the length and many cited resources, it is still very difficult to use the article as a design document. • Many open-source clones (Cassandra, Voldemort, Riak) have tried. • Many design concerns aren't touched upon • Why the decentralized structure is better? • Must be well-versed in distributed computing concepts in order to really understand whats going on on the first read-through.
Symmetry (Jori) • There are direct contradictions in regard to symmetry: • In section 2.3: "Symmetry: Every node in Dynamo should have the same set of responsibilities as its peers; there should be no distinguished node or nodes that take special roles or extra set of responsibilities." • In section 4.8.2: "To prevent logical partitions, some Dynamo nodes play the role of seeds... Seeds can be obtained either from static configuration or from a configuration service. Typically seeds are fully functional nodes in the Dynamo ring."
Symmetry contd. • No justification for this design choice except that it "simplifies the process of system provisioning and maintenance." • Membership and failure detection are presented in a hand-wavy manner. • In this sort of system, specialization can simplify the overall design. It is not necessary for high availability. • Chubby/Paxos (google-designed distributed storage system) uses a master coordinator approach which results in much simpler consistency algorithms. It allows updates to be serialized which prevents conflicts. • A distributed directory service layer for lookup would fix dynamo's scalability issue, since nodes would no longer have to gossip the entire routing table.
Symmetry contd. • Network connectivity is not symmetric. e.g. connections between nodes in the same data center are different than those between nodes in separate data centers. • The symmetric ring-based system does not reflect this inherent asymmetry. • Server hardware configurations are inherently asymmetric. By making a symmetric system, you rule out the advantages of specialization. One can no longer use different hardware for different components of a complex system.
WAN Considerations (Ning) • Non clear introduction for the interactions between data centers. • When a Dynamo clusters span a WAN, the odds of nodes rejoining the clusters and remaining out of date are signficantly increased. • If a node goes down, ‘hinted handoff’ sends updates to the next node in the ring. Since nodes of two data centers alternate, the updates are sent to the remote data center. When the node re-joins the cluster, if the network is partitioned (which happen all the time), the node will not catch up on pending updates for a long time (until the network partitioning is healed). • Authentication and authorization are ignored in this paper. However, these could cause problems in the ring membership management.
Consistency (Jori) • Principle for Symmetry and Decentralizaion • Centralization does not mean low availability and consistency does not need to be sacrificed for high availability: BigTable+GFS • Decentralized Architecture usually causes a lot of complexity • For handling transient failures, hinted handoff is complicated. • "0.06% of inconsistent values" • millions of transactions a day for Amazon, so this ends up being a lot.
Consistency contd. • Stale reads are possible and inconvenient • A node that has been down for a significant amount of time can rejoin a cluster completely out-of-date. There is no resynchronization barrier for reentry and no concept of how far behind it is. Merkle trees lead to slow catch-up. • Dynamo provides no bounds on stale reads to the detriment of developers e.g. a stale read could indirectly lead to an incorrect write, which is hard to track. • Practical implications: • Committed writes don't show up in subsequent reads. • Committed writes may show up in some subsequent reads, but then go missing. • There is no SLA for when writes are globally committed i.e. no nodes are still playing catch-up.
Consistency contd. • Conflict Resolution • Dynamo exposes resolution logic to the developer, making application logic more complex. • Since there are no bounds for stale reads or any centralized commit logs, data returned may be woefully out-of-date. • As noted before, this data loss can lead to unexpected situations that are hard to predict. • If the returned object is a list, deleted objects may reemerge after a conflict (shopping cart example)
Disaster Recovery (Ning) • Disaster: • Entire data center fails: no way to describe the state of surviving data centers, so data loss is unbounded: • One cannot quantify exactly how much data was lost. • The lost data will be possibly corrupted forever. • Lost data can result in stale reading: • transactional inconsistencies are that most applications are ill-equipped to handle. • Recovery: • The paper does not outline how disk corruptions and failures are handled. • Standard log-shipping based replication: one can at least keep track of replication log, and therefore have a general idea of how far behind a surviving cluster is.
Minor Quibbles • Amazon implemented the system in Java, but gave no justification as to why. If the concern is providing high-speed availability, why do it in a slow language like Java? • There are a few grammar mistakes and spelling mistakes throughout - could have used a couple more read-throughs. • Wish there were comparisons of various (N,R,W) configuration schemes • The size constraint on objects limits its applications. • End of section 4.4 "However, this problem has not surfaced in production and therefore this issue has not been thoroughly investigated."