1 / 14

Free Recovery: A Step Towards Self-Managing State

Free Recovery: A Step Towards Self-Managing State. Andy Huang and Armando Fox Stanford University. Persistent hash tables. Hash table. LAN. LAN. Frontends. DB. App Servers. Two state management challenges. Failure handling Consistency requirements Node recovery costly

jaron
Download Presentation

Free Recovery: A Step Towards Self-Managing State

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Free Recovery: A Step Towards Self-Managing State Andy Huang and Armando FoxStanford University

  2. Persistent hash tables Hash table LAN LAN Frontends DB App Servers

  3. Two state management challenges Failure handling Consistency requirements Node recovery costly Reliable failure detection Relax internal consistency Fast, non-intrusive recovery (“free”) DStore an easy-to-manage cluster-based persistent hash table for Internet services System evolution • Large data sets • Repartitioning is costly • Good resources provisioning • Free recovery • Automatic, online repartitioning

  4. DStore architecture Dlib Brick app server LAN DStore an easy-to-manage cluster-based persistent hash table for Internet services Dlib: exposes hash table API and is the “coordinator” for distributed operations Brick: stores data by writing synchronously to disk

  5. Focusing on recovery Write: send to all, wait for majority Read: read from majority OK if some bricks’ data differs Failure = missing some writes Technique 1: Quorums Tolerant to brick inconsistency Technique 2: Single-phase writes No request relies on specific bricks • 2PC: failure between phases complicates protocol • 2nd phase depends on particular set of bricks • Relies on reliable failure detection Single-phase quorum writes: can be completed by any majority of bricks Simple, non-intrusive recovery Any brick can fail at any time

  6. Considering consistency write(1) read 0 read 1 x = 0 Dlib failure can cause a partial write, violating the quorum property If timestamps differ, read-repair restores majority invariant Delayed commit Dl1 B1 B2 B3 Dl2

  7. Considering consistency write(1) read write 1 x = 0 A write-in-progress cookie can be used to detect partial writes and commit/abort on the next read An individual client’s view of DStore is consistent with that of a single centralized server (Bayou) Dl1 B1 B2 B3 Dl2

  8. Benchmark: Free recovery Worst-case behavior(100% cache hit rate) Expected behavior(85% cache hit rate) Recovery: fast and non-intrusive Brick killed Recovery

  9. Benchmark: Automatic failure detection Modest policy(anomaly threshold = 8) Aggressive policy(anomaly threshold = 5) Fail-stutter: detected by Pinpoint Fail-stutter False positives: low cost

  10. Online repartitioning 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 • Take brick offline • Copy data to new brick • Bring both bricks online Appears as if brick just failed and recovered

  11. Benchmark: Automatic online repartitioning Evenly-distributed load(3 to 6 bricks) Hotspot in 01 partition(6 to 12 bricks) Naive Naive Repartitioning: non-intrusive Brick selection: effective

  12. Next up for free recovery • Perform online checkpoints • Take checkpointing brick offline • Just like failure+recovery • See if free recovery can simplify online data reconstruction after hard failures • Any other state management challenges you can think of?

  13. Summary Cost: extra overprovisioning Cost: temporarily violates “majority” invariant Gain: fast, non-intrusive recovery Gain: any brick can fail at any time  Mechanism: simple reboot • Mechanism: automatic, online repartitioning Policy: aggressively reboot anomalous bricks  Policy: dynamically add and remove nodes based on predicted load DStore = Decoupled Storage Quorums [spacial decoupling] Single-phase ops [temporal decoupling] Free recovery Failure handling  fast, non-intrusive System evolution “plug-and-play” Managed like a stateless Web farm

  14. DStore an easy-to-manage cluster-based persistent hash table for Internet services andy.huang@stanford.edu

More Related