1 / 32

Wide-Area Cooperative Storage with CFS

Learn about CFS, a wide-area storage solution allowing efficient data sharing through distributed services. Explore its unique features and architecture, overcoming design challenges for reliable, scalable performance.

viviansimon
Download Presentation

Wide-Area Cooperative Storage with CFS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley

  2. Target CFS Uses node node • Serving data with inexpensive hosts: • open-source distributions • off-site backups • tech report archive • efficient sharing of music node Internet node node

  3. How to mirror open-source distributions? • Multiple independent distributions • Each has high peak load, low average • Individual servers are wasteful • Solution: aggregate • Option 1: single powerful server • Option 2: distributed service • But how do you find the data?

  4. Design Challenges • Avoid hot spots • Spread storage burden evenly • Tolerate unreliable participants • Fetch speed comparable to whole-file TCP • Avoid O(#participants) algorithms • Centralized mechanisms [Napster], broadcasts [Gnutella] • CFS solves these challenges

  5. The Rest of the Talk • Software structure • Chord distributed hashing • DHash block management • Evaluation • Design focus: simplicity, proven properties

  6. CFS Architecture • Each node is a client and a server (like xFS) • Clients can support different interfaces • File system interface • Music key-word search (like Napster and Gnutella) server client client server Internet node node

  7. Client-server interface • Files have unique names • Files are read-only (single writer, many readers) • Publishers split files into blocks • Clients check files for authenticity [SFSRO] Insert file f Insert block FS Client server server Lookup block Lookup file f node node

  8. Server Structure DHash DHash Chord Chord Node 1 Node 2 • DHash stores, balances, replicates, caches blocks • DHash uses Chord [SIGCOMM 2001] to locate blocks

  9. Chord Hashes a Block ID to its Successor Block ID Node ID N10 B112, B120, …, B10 B100 N100 Circular ID Space N32 B11, B30 B65, B70 N80 N60 B33, B40, B52 • Nodes and blocks have randomly distributed IDs • Successor: node with next highest ID

  10. Basic Lookup N5 N10 “Where is block 70?” N110 N20 N99 N32 N40 “N80” N80 N60 • Lookups find the ID’s predecessor • Correct if successors are correct

  11. Successor Lists Ensure Robust Lookup 10, 20, 32 N5 20, 32, 40 N10 5, 10, 20 N110 32, 40, 60 N20 110, 5, 10 N99 40, 60, 80 N32 N40 60, 80, 99 99, 110, 5 N80 N60 80, 99, 110 • Each node stores r successors, r = 2 log N • Lookup can skip over dead nodes to find blocks

  12. Chord Finger Table Allows O(log N) Lookups ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80 • See [SIGCOMM 2000] for table maintenance

  13. DHash/Chord Interface • lookup() returns list with node IDs closer in ID space to block ID • Sorted, closest first Lookup(blockID) List of <node-ID, IP address> DHash server Chord finger table with <node IDs, IP address>

  14. DHash Uses Other Nodes to Locate Blocks N5 N10 N110 N20 N99 1. 2. N40 3. N50 N80 N60 N68 Lookup(BlockID=45)

  15. Storing Blocks • Long-term blocks are stored for a fixed time • Publishers need to refresh periodically • Cache uses LRU cache Long-term block storage disk:

  16. Replicate blocks at r successors N5 N10 N110 N20 N99 Block 17 N40 N50 N80 N68 N60 • Node IDs are SHA-1 of IP Address • Ensures independent replica failure

  17. Lookups find replicas N5 N10 N110 2. N20 1. 3. N99 Block 17 N40 4. RPCs: 1. Lookup step 2. Get successor list 3. Failed block fetch 4. Block fetch N50 N80 N68 N60 Lookup(BlockID=17)

  18. First Live Successor Manages Replicas N5 N10 N110 N20 N99 Copy of 17 Block 17 N40 N50 N80 N68 N60 • Node can locally determine that it is the first live successor

  19. DHash Copies to Caches Along Lookup Path N5 N10 N110 1. N20 N99 2. N40 4. RPCs: 1. Chord lookup 2. Chord lookup 3. Block fetch 4. Send to cache N50 N80 3. N60 N68 Lookup(BlockID=45)

  20. Caching at Fingers Limits Load N32 • Only O(log N) nodes have fingers pointing to N32 • This limits the single-block load on N32

  21. Virtual Nodes Allow Heterogeneity • Hosts may differ in disk/net capacity • Hosts may advertise multiple IDs • Chosen as SHA-1(IP Address, index) • Each ID represents a “virtual node” • Host load proportional to # v.n.’s • Manually controlled N10 N60 N101 N5 Node B Node A

  22. Fingers Allow Choice of Paths N115 N18 10ms N25 50ms N96 100ms 12ms N37 N90 N48 N80 Lookup(47) N55 N70 • Each node monitors RTTs to its own fingers • Tradeoff: ID-space progress vs delay

  23. Why Blocks Instead of Files? • Cost: one lookup per block • Can tailor cost by choosing good block size • Benefit: load balance is simple • For large files • Storage cost of large files is spread out • Popular files are served in parallel

  24. CFS Project Status • Working prototype software • Some abuse prevention mechanisms • SFSRO file system client • Guarantees authenticity of files, updates, etc. • Napster-like interface in the works • Decentralized indexing system • Some measurements on RON testbed • Simulation results to test scalability

  25. Experimental Setup (12 nodes) To vu.nl lulea.se ucl.uk • One virtual node per host • 8Kbyte blocks • RPCs use UDP To kaist.kr, .ve • Caching turned off • Proximity routing turned off

  26. CFS Fetch Time for 1MB File Fetch Time (Seconds) Prefetch Window (KBytes) • Average over the 12 hosts • No replication, no caching; 8 KByte blocks

  27. Distribution of Fetch Times for 1MB 40 Kbyte Prefetch 24 Kbyte Prefetch 8 Kbyte Prefetch Fraction of Fetches Time (Seconds)

  28. CFS Fetch Time vs. Whole File TCP 40 Kbyte Prefetch Whole File TCP Fraction of Fetches Time (Seconds)

  29. Robustness vs. Failures Six replicas per block; (1/2)6 is 0.016 Failed Lookups (Fraction) Failed Nodes (Fraction)

  30. Future work • Test load balancing with real workloads • Deal better with malicious nodes • Proximity heuristics • Indexing • Other applications

  31. Related Work • SFSRO • Freenet • Napster • Gnutella • PAST • CAN • OceanStore

  32. CFS Summary • CFS provides peer-to-peer r/o storage • Structure: DHash and Chord • It is efficient, robust, and load-balanced • It uses block-level distribution • The prototype is as fast as whole-file TCP http://www.pdos.lcs.mit.edu/chord

More Related