1 / 1

OceanStore: An Architecture for Global-Scale Persistent Storage

Current status: Pond implementation complete. Pond implementation All major subsystems completed Fault-tolerant inner ring, erasure-coding archive Software released to developer community outside Berkeley 280K lines of Java, JNI libraries for crypto, archive Several applications implemented

Download Presentation

OceanStore: An Architecture for Global-Scale Persistent Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Current status: Pond implementation complete • Pond implementation • All major subsystems completed • Fault-tolerant inner ring, erasure-coding archive • Software released to developer community outside Berkeley • 280K lines of Java, JNI libraries for crypto, archive • Several applications implemented • See FAST paper on Pond prototype and benchmarking • Deployed on PlanetLab • Initiative to provide researchers with wide-area testbed • http://www.planet-lab.org • ~100 hosts, ~40 sites, multiple continets • Allows pond to run up to 1000 virtual nodes • Have successfully run applications in wide-area • Created tools to allow quick deployment to PlanetLab Update Latency (ms) Key Size Update Size MedianTime 512b 4kB 40 2MB 1086 1024b 4kB 99 2MB 1150 OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley http://oceanstore.cs.berkeley.edu Overview Key components: Tapestry and Inner Ring Key components: Archival Storage and Replicas • OceanStore is a global-scale data utility for Internet services • How OceanStore is used • Application/user data is stored in objects • Objects are placed in global OceanStore infrastructure • Objects are accessed via Global Unique Identifiers • Objects are modified via action/predicate pairs • Each operation creates new version of object • Internet services (applications) define object format and content • Potential Internet services • Web caches, global file systems, Hotmail-like mail portals, etc. • Tapestry • Decentralized Object Location and Routing (DOLR) • Provides routing to object independent of its location • Automatically reroutes to backup nodes when failures occur • Based on Plaxton algorithm • Overlay network • Scales for systems with large number of nodes • See Tapestry poster for more information • Archival Storage • Provides extreme durability of data objects • Disseminates archival fragments throughout infrastructure • Fragment replication and repair ensures durability • Utilizes erasure codes • Redundancy without overhead of complete replication • Data objects are coded at a rate, r = m/n • Produces n fragments, where any m can reconstruct object • Storage overhead is n/m • Inner Ring • A set of nodes per object chosen by Responsible Party • Applies updates/writes requested by user • Checks all predicates and access control lists • Byzantine agreement used to check and serialize updates • Based on algorithm by Castro and Liskov • Ensures correctness even with f of 3f+1 nodes compromised • Threshold encryption used • Replicas • Full copies of data objects stored in peer-to-peer infrastructure • Enables fast access • Introspection allows replicas to self-organize • Replicas migrate towards client accesses • Encryption of objects ensures data privacy • Dissemination tree is used to alert replicas of object updates • Goals • Global-scale • Extreme durability of data • Use untrusted infrastructure • Maintenance-free operation • Privacy of data • Automatic performance tuning • Enabling technologies • Peer-to-peer and overlay networks • Erasure encoding and replication • Byzantine agreements • Repair and automatic node failover • Encryption and access control • Introspection and data clustering Archival Storage Current status: Pond implementation complete Internet services built on OceanStore Client Client • Pond implementation • All major subsystems completed • Fault-tolerant inner ring, erasure-coding archive • Software released to developer community outside Berkeley • 280K lines of Java, JNI libraries for crypto, archive • Several applications implemented • See FAST paper on Pond prototype and benchmarking • MINNO • Global-scale e-mail system built on OceanStore • Enables e-mail storage and access to user accounts • Send e-mail via SMTP proxy, read and organize via IMAP • MINNO stores data in four types of OceanStore objects: • Folder list, Folder, Message, and Maildrop • Relaxed consistency model enables fast wide-area access Inner ring • Riptide • Web caching infrastructure • Uses data migration to move web objects closer to users • Verifies integrity of web content • Deployed on PlanetLab • Initiative to provide researchers with wide-area testbed • http://www.planet-lab.org • ~100 hosts, ~40 sites, multiple continents • Allows pond to run up to 1000 virtual nodes • Have successfully run applications in wide-area • Created tools to allow quick deployment to PlanetLab Client Inner ring • NFS • Provides traditional file system support • Enables time travel (reverting files/dirs) through OceanStore’s versioning primitives Replicas • Many others • Palm pilot synchronizer, AFS, etc. Archival Storage Pond prototype benchmarks Application benchmarks Conclusions and future directions • OceanStore’s accomplishments • Major prototype completed • Several fully-functional Internet services built and deployed • Demonstrated feasibility of the approach • Published results on system’s performance • Collaborating with other global-scale research initiatives • NFS: Andrew benchmark • Client in Berkeley, server in Seattle • 4.6x than NFS in read-intensive phases • 7.3x slower in write-intensive phases • Reasonable time w/ key size of 512 • Signature time is the bottleneck • Object update latency • Measures latency of inner ring • Byzantine agreement commit time • Shows threshold signature is costly • 100 ms latency on object writes • Current research directions • Investigate new introspective data placement strategies • Finish adding features • Tentative update sharing between sessions • Archival repair • Replica management • Improve existing performance and deploy to larger networks • Examine bottlenecks • Improve stability • Data structure improvements • Develop more applications • MINNO: Login time • Client cache sync time w/ new msg retrieval • Measured time vs. latency to inner ring • Simulates mobile clients • MINNO adapts well with data migration andtentative commits enabled • Outperforms traditional IMAP server w/ noprocessing overhead • Object update throughput • Measures object write throughput • Base system provides 8 MBps • Batch updates to get good performance

More Related