Free Recovery: A Step Towards Self-Managing State

Free Recovery: A Step Towards Self-Managing State Andy Huang and Armando FoxStanford University

Persistent hash tables Hash table LAN LAN Frontends DB App Servers

Two state management challenges Failure handling Consistency requirements Node recovery costly Reliable failure detection Relax internal consistency Fast, non-intrusive recovery (“free”) DStore an easy-to-manage cluster-based persistent hash table for Internet services System evolution • Large data sets • Repartitioning is costly • Good resources provisioning • Free recovery • Automatic, online repartitioning

DStore architecture Dlib Brick app server LAN DStore an easy-to-manage cluster-based persistent hash table for Internet services Dlib: exposes hash table API and is the “coordinator” for distributed operations Brick: stores data by writing synchronously to disk

Focusing on recovery Write: send to all, wait for majority Read: read from majority OK if some bricks’ data differs Failure = missing some writes Technique 1: Quorums Tolerant to brick inconsistency Technique 2: Single-phase writes No request relies on specific bricks • 2PC: failure between phases complicates protocol • 2nd phase depends on particular set of bricks • Relies on reliable failure detection Single-phase quorum writes: can be completed by any majority of bricks Simple, non-intrusive recovery Any brick can fail at any time

Considering consistency write(1) read 0 read 1 x = 0 Dlib failure can cause a partial write, violating the quorum property If timestamps differ, read-repair restores majority invariant Delayed commit Dl1 B1 B2 B3 Dl2

Considering consistency write(1) read write 1 x = 0 A write-in-progress cookie can be used to detect partial writes and commit/abort on the next read An individual client’s view of DStore is consistent with that of a single centralized server (Bayou) Dl1 B1 B2 B3 Dl2

Benchmark: Free recovery Worst-case behavior(100% cache hit rate) Expected behavior(85% cache hit rate) Recovery: fast and non-intrusive Brick killed Recovery

Benchmark: Automatic failure detection Modest policy(anomaly threshold = 8) Aggressive policy(anomaly threshold = 5) Fail-stutter: detected by Pinpoint Fail-stutter False positives: low cost

Online repartitioning 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 • Take brick offline • Copy data to new brick • Bring both bricks online Appears as if brick just failed and recovered

Benchmark: Automatic online repartitioning Evenly-distributed load(3 to 6 bricks) Hotspot in 01 partition(6 to 12 bricks) Naive Naive Repartitioning: non-intrusive Brick selection: effective

Next up for free recovery • Perform online checkpoints • Take checkpointing brick offline • Just like failure+recovery • See if free recovery can simplify online data reconstruction after hard failures • Any other state management challenges you can think of?

Summary Cost: extra overprovisioning Cost: temporarily violates “majority” invariant Gain: fast, non-intrusive recovery Gain: any brick can fail at any time  Mechanism: simple reboot • Mechanism: automatic, online repartitioning Policy: aggressively reboot anomalous bricks  Policy: dynamically add and remove nodes based on predicted load DStore = Decoupled Storage Quorums [spacial decoupling] Single-phase ops [temporal decoupling] Free recovery Failure handling  fast, non-intrusive System evolution “plug-and-play” Managed like a stateless Web farm

DStore an easy-to-manage cluster-based persistent hash table for Internet services andy.huang@stanford.edu

Free Recovery: A Step Towards Self-Managing State

Free Recovery: A Step Towards Self-Managing State

Presentation Transcript

Managing Conflict

Substance Abuse Curriculum

DirectCompute: Capturing the Teraflop

Recovery Act Assistance For LIHTC Projects

Principles of Incident Response and Disaster Recovery

Chapter 17: Recovery System

Substance Abuse Curriculum

Managing Generational Differences in the Workplace

7-Step Strategic Sourcing Process

Step by Step Guide

Enhanced Recovery

Step by Step to Prevention Outcomes:

Chapter 17: Recovery System

Welcome and an acknowledgement to country

Laser action summary Step 1 : Choose a proper lasing medium

Chapter 17: Recovery System

DirectCompute: Capturing the Teraflop

Guide to Disaster Recovery

Stress Free Life