CS 268: Lecture 22 DHT Applications

CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776 (Presentation based on slides from Robert Morris and Sean Rhea)

Outline • Cooperative File System (CFS) • Open DHT

Target CFS Uses • Serving data with inexpensive hosts: • open-source distributions • off-site backups • tech report archive • efficient sharing of music node node node Internet node node

How to mirror open-source distributions? • Multiple independent distributions • Each has high peak load, low average • Individual servers are wasteful • Solution: aggregate • Option 1: single powerful server • Option 2: distributed service • But how do you find the data?

Design Challenges • Avoid hot spots • Spread storage burden evenly • Tolerate unreliable participants • Fetch speed comparable to whole-file TCP • Avoid O(#participants) algorithms • Centralized mechanisms [Napster], broadcasts [Gnutella] • CFS solves these challenges

CFS Architecture • Each node is a client and a server • Clients can support different interfaces • File system interface • Music key-word search server client client server Internet node node

Client-server interface • Files have unique names • Files are read-only (single writer, many readers) • Publishers split files into blocks • Clients check files for authenticity Insert file f Insert block FS Client server server Lookup block Lookup file f node node

Server Structure DHash DHash Chord Chord Node 1 Node 2 • DHash stores, balances, replicates, caches blocks • DHash uses Chord [SIGCOMM 2001] to locate blocks

Chord Hashes a Block ID to its Successor Block ID Node ID N10 B112, B120, …, B10 B100 N100 Circular ID Space N32 B11, B30 B65, B70 N80 N60 B33, B40, B52 • Nodes and blocks have randomly distributed IDs • Successor: node with next highest ID

DHash/Chord Interface • lookup() returns list with node IDs closer in ID space to block ID • Sorted, closest first Lookup(blockID) List of <node-ID, IP address> DHash server Chord finger table with <node IDs, IP address>

DHash Uses Other Nodes to Locate Blocks N5 N10 N110 N20 N99 1. 2. N40 3. N50 N80 N60 N68 Lookup(BlockID=45)

Storing Blocks • Long-term blocks are stored for a fixed time • Publishers need to refresh periodically • Cache uses LRU cache Long-term block storage disk:

Replicate blocks at r successors N5 N10 N110 N20 N99 Block 17 N40 N50 N80 N68 N60 • Node IDs are SHA-1 of IP Address • Ensures independent replica failure

Lookups find replicas N5 N10 N110 2. N20 1. 3. N99 Block 17 N40 4. RPCs: 1. Lookup step 2. Get successor list 3. Failed block fetch 4. Block fetch N50 N80 N68 N60 Lookup(BlockID=17)

First Live Successor Manages Replicas N5 N10 N110 N20 N99 Copy of 17 Block 17 N40 N50 N80 N68 N60 • Node can locally determine that it is the first live successor

DHash Copies to Caches Along Lookup Path N5 N10 N110 1. N20 N99 2. N40 4. RPCs: 1. Chord lookup 2. Chord lookup 3. Block fetch 4. Send to cache N50 N80 3. N60 N68 Lookup(BlockID=45)

Caching at Fingers Limits Load N32 • Only O(log N) nodes have fingers pointing to N32 • This limits the single-block load on N32

Virtual Nodes Allow Heterogeneity • Hosts may differ in disk/net capacity • Hosts may advertise multiple IDs • Chosen as SHA-1(IP Address, index) • Each ID represents a “virtual node” • Host load proportional to # v.n.’s • Manually controlled N10 N60 N101 N5 Node B Node A

Why Blocks Instead of Files? • Cost: one lookup per block • Can tailor cost by choosing good block size • Benefit: load balance is simple • For large files • Storage cost of large files is spread out • Popular files are served in parallel

Outline • Cooperative File System (CFS) • Open DHT

Questions: • How many DHTs will there be? • Can all applications share one DHT?

Benefits of Sharing a DHT • Amortizes costs across applications • Maintenance bandwidth, connection state, etc. • Facilitates “bootstrapping” of new applications • Working infrastructure already in place • Allows for statistical multiplexing of resources • Takes advantage of spare storage and bandwidth • Facilitates upgrading existing applications • “Share” DHT between application versions

K V K V K V K V K V K V K V K V K V K V The DHT as a Service

K V K V K V K V K V K V K V K V K V K V The DHT as a Service OpenDHT

The DHT as a Service OpenDHT Clients

The DHT as a Service OpenDHT

The DHT as a Service What is this interface? OpenDHT

It’s not lookup() lookup(k) Challenges: Distribution Security What does this node do with it? k

How are DHTs Used? • Storage • CFS, UsenetDHT, PKI, etc. • Rendezvous • Simple: Chat, Instant Messenger • Load balanced: i3 • Multicast: RSS Aggregation, White Board • Anycast: Tapestry, Coral

What about put/get? • Works easily for storage applications • Easy to share • No upcalls, so no code distribution or security complications • But does it work for rendezvous? • Chat? Sure: put(my-name, my-IP) • What about the others?

Protecting Against Overuse • Must protect system resources against overuse • Resources include network, CPU, and disk • Network and CPU straightforward • Disk harder: usage persists long after requests • Hard to distinguish malice from eager usage • Don’t want to hurt eager users if utilization low • Number of active users changes over time • Quotas are inappropriate

Fair Storage Allocation • Our solution: give each client a fair share • Will define “fairness” in a few slides • Limits strength of malicious clients • Only as powerful as they are numerous • Protect storage on each DHT node separately • Must protect each subrange of the key space • Rewards clients that balance their key choices

Client 1 arrives fills 50% of disk Client 2 arrives fills 40% of disk Client 3 arrives max share = 10% time The Problem of Starvation • Fair shares change over time • Decrease as system load increases Starvation!

Preventing Starvation • Simple fix: add time-to-live (TTL) to puts • put (key, value)  put (key, value, ttl) • Prevents long-term starvation • Eventually all puts will expire

Preventing Starvation • Simple fix: add time-to-live (TTL) to puts • put (key, value)  put (key, value, ttl) • Prevents long-term starvation • Eventually all puts will expire • Can still get short term starvation Client A arrives fills entire of disk Client B arrives asks for space Client A’s values start expiring time B Starves

max Sum must be < max capacity Reserved for future puts. Slope = rmin TTL space size Candidate put 0 time now max Preventing Starvation • Stronger condition: Be able to accept rmin bytes/sec new data at all times • This is non-trivial to arrange!

max max TTL TTL size space space size 0 0 time time now now max max Preventing Starvation • Stronger condition: Be able to accept rmin bytes/sec new data at all times • This is non-trivial to arrange! Violation!

Preventing Starvation • Formalize graphical intuition: f() = B(tnow) - D(tnow, tnow+ ) + rmin  • D(tnow, tnow+ ): aggregate size of puts expiring in the interval (tnow, tnow+ ) • To accept put of size x and TTL l: f() + x < C for all 0 ≤  < l • Can track the value of f efficiently with a tree • Leaves represent inflection points of f • Add put, shift time are O(log n), n = # of puts

Queue full: reject put Per-client put queues Wait until can accept without violating rmin Select most under- represented Not full: enqueue put The Big Decision: Definition of “most under-represented” Fair Storage Allocation Store and send accept message to client

Client A arrives fills entire of disk Client B arrives asks for space B catches up with A Now A Starves! time Defining “Most Under-Represented” • Not just sharing disk, but disk over time • 1 byte put for 100s same as 100 byte put for 1s • So units are bytes  seconds, call them commitments • Equalize total commitments granted? • No: leads to starvation • A fills disk, B starts putting, A starves up to max TTL

Client A arrives fills entire of disk Client B arrives asks for space B catches up with A time A & B share available rate Defining “Most Under-Represented” • Instead, equalize rate of commitments granted • Service granted to one client depends only on others putting “at same time”

Defining “Most Under-Represented” • Instead, equalize rate of commitments granted • Service granted to one client depends only on others putting “at same time” • Mechanism inspired by Start-time Fair Queuing • Have virtual time, v(t) • Each put gets a start time S(pci) and finish time F(pci) F(pci) = S(pci) + size(pci)  ttl(pci) S(pci) = max(v(A(pci)) - , F(pci-1)) v(t) = maximum start time of all accepted puts

FST Performance

CS 268: Lecture 22 DHT Applications

CS 268: Lecture 22 DHT Applications

Presentation Transcript

Applications of Computers Lecture-3

Applications of AI

Nanotechnology Lecture 4/12/04

MPLS and its Applications CS 520 – Winter 2007 Lecture 17

Applications of Islamic Finance

259 Lecture 2 Spring 2013

Lecture 12 APPLICATIONS OF GROUP THEORY 1) Chirality

Lecture II: Linear Applications of Opamp

Lecture 2 (Mapping Applications to Multi-core Arch)

Lecture 24: Applications of Valence Bond Theory

Software Engineering of Internet Applications lecture 4 (introduction to SOAP)

CA Applications (1)

Adaptive Optics and its Applications Lecture 1

Lecture and lab schedule

Semantic Web

Lecture 8

Database Applications and Web-Enabled Databases

Lecture 8

CS 267 Applications of Parallel Computers Lecture 15: Graph Partitioning - II

Applications of Hidden Markov Models

Lecture 1: Introduction, Basic UNIX

Semantic Web