Evaluating a Defragmented DHT Filesystem

Evaluating a Defragmented DHT Filesystem Jeff Pang Phil Gibbons, Michael Kaminksy, Haifeng Yu, Sinivasan Seshan Intel Research Pittsburgh, CMU

Problem Summary • TRADITIONALDISTRIBUTED HASH TABLE (DHT) • Each server responsible for pseudo-random range of ID space • Objects are given pseudo-random IDs 324 987 160 211-400 401-513 150-210 800-999

Problem Summary • DEFRAGMENTEDDHT • Each server responsible for dynamically balanced range of ID space • Objects are given contiguous IDs 320 321 322 211-400 401-513 150-210 800-999

Motivation • Better availability • You depend on fewer servers when accessing your files • Better end-to-end performance • You don’t have to perform as many DHT lookups when accessing your files

Availability Setup • Evaluated via simulation • ~250 nodes with 1.5Mbps each • Faultload: PlanetLab failure trace (2003) • included one 40 node failure event • Workload: Harvard NFS trace (2003) • primarily home directories used by researchers • Compare: • Traditional DHT: data placed using consisent hashing • Defragmented DHT: data placed contiguously and load balanced dynamically (via Mercury)

Availability Setup • Metric: failure rate of user “tasks” • Task(i,m) = sequence of accesses with a interarrival threshold of i and max time of m • Task(1sec,5min) = sequence of accesses that are spaced no more than 1 sec apart and last no more than 5 minutes • Idea: capture notion of “useful unit of work” • Not clear what values are right • Therefore we evaluated many variations <1sec <1sec 5min … Task(1sec,…) Task(1sec,5min)

Availability Results • Failure rate of 5 trials • Lower is better • Note log scale • Missing bars have 0 failures • Explanation • User tasks access 10-20x fewer nodes in the defragmented design

Performance Setup • Deploy real implementation • 200-1000 virtual nodes with 1.5Mbps (Emulab) • Measured global e2e latencies (MIT King) • Workload: Harvard NFS • Compare: • Traditional vs Defragmented • Implementation • Uses Symphony/Mercury DHTs, respectively • Both use TCP for data transport • Both employ a Lookup Cache: remembers recently contacted nodes and their DHT ranges

Performance Setup • Metric: task(1sec,infinity) speedup • Task t takes 200msec in Traditional • Task t takes 100msec in Defragmented • speedup(t) = 200/100 = 2 • Idea: capture speedup for each unit of work that is independent of user think time • Note: 1 second interarrival threshold is conservative => tasks are longer • Defragmented does better with shorter tasks(next slide)

Performance Setup • Accesses within a task may or may not be inter-dependent • Task = (A,B,…) • App. may read A, then depending on contents of A, read B • App. may read A and B regardless of contents • Replay trace to capture both extremes • Sequential - Each access must complete before starting the next (best for Defragmented) • Parallel - All accesses in a task can be submitted in parallel (best for Traditional) [caveat: limited to 15 outstanding]

Performance Results

Performance Results • Other factors: • TCP slow start • Most tasks are small

Overhead • Defragmented design is not free • We want to maintain load balance • Dynamic load balance => data migration

Conclusions • Defragmented DHT Filesystem benefits: • Reduces task failures by an order of magnitude • Speeds up tasks by 50-100% • Overhead might be reasonable: 1 byte written = 1.5 bytes transferred • Key assumptions: • Most tasks are small to medium sized (file systems, web, etc. -- not streaming) • Wide area e2e latencies are tolerable

Tommy Maddox Slides

Load Balance

Lookup Traffic

Availability Breakdown

Performance Breakdown

Performance Breakdown 2 • With parallel playback, the Defragmented suffers on the small number of very long tasks ignore - due to topology

Maximum Overhead

Other Workloads

Evaluating a Defragmented DHT Filesystem

Evaluating a Defragmented DHT Filesystem

Presentation Transcript

Filesystem administration

Linux Filesystem

Filesystem management

Handling Churn in a DHT

DHT Selection

OpenDHT: A Shared, Public DHT Service

Linux Filesystem Hierarchy

Linux Filesystem

DHT* Applications

OpenDHT: A Public DHT Service

Koorde: A Simple Degree Optimal DHT

Filesystem Hierarchy

Virtual Filesystem

Linux Filesystem Features

Filesystem Security

A Public DHT Service

A performance vs. cost framework for evaluating DHT design tradeoffs under churn

Linux Filesystem Features