1 / 29

Pond: the OceanStore Prototype

Pond: the OceanStore Prototype. CS 6464 Cornell University Presented by Yeounoh Chung. Motivation / Introduction. “OceanStore is an Internet-scale, persistent data store” “for the first time, one can imagine providing truly durable, self maintaining storage to every computer user.”

sibyl
Download Presentation

Pond: the OceanStore Prototype

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pond: the OceanStore Prototype CS 6464 Cornell UniversityPresented by Yeounoh Chung

  2. Motivation / Introduction • “OceanStore is an Internet-scale, persistent data store” • “for the first time, one can imagine providing truly durable, self maintaining storage to every computer user.” • “vision” of highly available, reliable, and persistent, data store utility model- Amazon S3 ?!

  3. Motivation / Introduction

  4. Outline • Motivation / Introduction • System Overview • Consistency • Persistency • Failure Tolerance • Implementation • Performance • Conclusion • Related Work

  5. Hakim Weatherspoon Patrick Eaton Ben Zhao Sean Rhea Dennis Geels System Overview OceanStore Cloud

  6. System Overview (Data Object)

  7. System Overview (Update Model) • Updates are applied atomically • An array of actions guarded by a predicate • No explicit locks • Application-specific consistency- e.g. database, mailbox

  8. HotOS Attendee Paul Hogan System Overview (Tapestry) • Scalable overlay network, built on TCP/IP • Performs DOLR based on GUID- Virtualization- Location independence • Locality aware • Self-organizing • self-maintaining

  9. System Overview (Primary Rep.) • Each data object is assigned an inner-ring • Apply updates and create new versions • Byzantine fault-tolerant • Ability to change inner-ring servers any time - public key cryptography, proactive threshold signature, Tapestry • Responsible party

  10. W X f Y Z W f -1 Z System Overview (Achi. Storage) • Erasure codes are more space efficient • Fragments are distributed uniformly among archival storage servers- BGUID, n_frag • Pond uses Cauchy Reed-Solomon code

  11. System Overview (Caching) • Promiscuous caching • Whole-block caching • Host caches the read block and publishes its posession in Tapestry • Pond uses LRU • Use Heartbeat to get the most recent copy

  12. System Overview (Diss. Tree)

  13. Outline • Motivation / Introduction • System Overview • Consistency • Persistency • Failure Tolerance • Implementation • Performance • Conclusion • Related Work

  14. Consistency (Primary Replica) • Read-only blocks • Application-specific consistency • Primary-copy replication- heartbeat <AGUID, VGUID, t, v_seq>

  15. Persistency (Archival Storage) • Archival storage • Aggressive replication • Monitoring- Introspection • Replacement- Tapestry

  16. Failure Tolerance (Everybody) • All newly created blocks are encoded and stored in Archival servers • Aggressive replication • Byzantine agreement protocol for inner-ring • Responsible Party- single point of failure?- scalability?

  17. Outline • Motivation / Introduction • System Overview • Consistency • Persistency • Failure Tolerance • Implementation • Performance • Conclusion • Related Work

  18. Implementation • Built in Java, atop SEDA • Major subsystems are functional- self-organizing Tapestry- primary replica with Byzantine agreement- self-organizing dissemination tree- erasure-coding archive- application interface: NFS, IMAP/SMTP, HTTP

  19. Implementation

  20. Outline • Motivation / Introduction • System Overview • Consistency • Persistency • Failure Tolerance • Implementation • Performance • Conclusion • Related Work

  21. Performance • Updata performance • Dissemination tree performance • Archival retrieval performance • The Andrew Benchmark

  22. Performance (test beds) • Local cluster- 42 machines at Berkeley- 2x 1.0 GHz CPU, 1.5 GB SDRAM, 2x 36 GB hard drives- gigabit Ethernet adaptor and switch • PlanetLab- 101 nodes across 43 sites- 1.2 GHz, 1 GB memory

  23. Performance (update)

  24. Performance (update)

  25. Performance (archival)

  26. Performance (dissemination tree)

  27. Performance (Andrew benchmark)

  28. Conclusion • Pond is a working subset of the vision • Promising in WAN • Threshold signatures, erasure-coded archivalare expensive • Pond is fault tolerant system, but it is not tested with any failed node • Any thoughts?

  29. Related Work • FarSite • ITTC, COCA • PAST, CFS, IVY • Pangaea

More Related