10 likes | 198 Views
Current status: Pond implementation complete. Pond implementation All major subsystems completed Fault-tolerant inner ring, erasure-coding archive Software released to developer community outside Berkeley 280K lines of Java, JNI libraries for crypto, archive Several applications implemented
E N D
Current status: Pond implementation complete • Pond implementation • All major subsystems completed • Fault-tolerant inner ring, erasure-coding archive • Software released to developer community outside Berkeley • 280K lines of Java, JNI libraries for crypto, archive • Several applications implemented • See FAST paper on Pond prototype and benchmarking • Deployed on PlanetLab • Initiative to provide researchers with wide-area testbed • http://www.planet-lab.org • ~100 hosts, ~40 sites, multiple continets • Allows pond to run up to 1000 virtual nodes • Have successfully run applications in wide-area • Created tools to allow quick deployment to PlanetLab Update Latency (ms) Key Size Update Size MedianTime 512b 4kB 40 2MB 1086 1024b 4kB 99 2MB 1150 OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley http://oceanstore.cs.berkeley.edu Overview Key components: Tapestry and Inner Ring Key components: Archival Storage and Replicas • OceanStore is a global-scale data utility for Internet services • How OceanStore is used • Application/user data is stored in objects • Objects are placed in global OceanStore infrastructure • Objects are accessed via Global Unique Identifiers • Objects are modified via action/predicate pairs • Each operation creates new version of object • Internet services (applications) define object format and content • Potential Internet services • Web caches, global file systems, Hotmail-like mail portals, etc. • Tapestry • Decentralized Object Location and Routing (DOLR) • Provides routing to object independent of its location • Automatically reroutes to backup nodes when failures occur • Based on Plaxton algorithm • Overlay network • Scales for systems with large number of nodes • See Tapestry poster for more information • Archival Storage • Provides extreme durability of data objects • Disseminates archival fragments throughout infrastructure • Fragment replication and repair ensures durability • Utilizes erasure codes • Redundancy without overhead of complete replication • Data objects are coded at a rate, r = m/n • Produces n fragments, where any m can reconstruct object • Storage overhead is n/m • Inner Ring • A set of nodes per object chosen by Responsible Party • Applies updates/writes requested by user • Checks all predicates and access control lists • Byzantine agreement used to check and serialize updates • Based on algorithm by Castro and Liskov • Ensures correctness even with f of 3f+1 nodes compromised • Threshold encryption used • Replicas • Full copies of data objects stored in peer-to-peer infrastructure • Enables fast access • Introspection allows replicas to self-organize • Replicas migrate towards client accesses • Encryption of objects ensures data privacy • Dissemination tree is used to alert replicas of object updates • Goals • Global-scale • Extreme durability of data • Use untrusted infrastructure • Maintenance-free operation • Privacy of data • Automatic performance tuning • Enabling technologies • Peer-to-peer and overlay networks • Erasure encoding and replication • Byzantine agreements • Repair and automatic node failover • Encryption and access control • Introspection and data clustering Archival Storage Current status: Pond implementation complete Internet services built on OceanStore Client Client • Pond implementation • All major subsystems completed • Fault-tolerant inner ring, erasure-coding archive • Software released to developer community outside Berkeley • 280K lines of Java, JNI libraries for crypto, archive • Several applications implemented • See FAST paper on Pond prototype and benchmarking • MINNO • Global-scale e-mail system built on OceanStore • Enables e-mail storage and access to user accounts • Send e-mail via SMTP proxy, read and organize via IMAP • MINNO stores data in four types of OceanStore objects: • Folder list, Folder, Message, and Maildrop • Relaxed consistency model enables fast wide-area access Inner ring • Riptide • Web caching infrastructure • Uses data migration to move web objects closer to users • Verifies integrity of web content • Deployed on PlanetLab • Initiative to provide researchers with wide-area testbed • http://www.planet-lab.org • ~100 hosts, ~40 sites, multiple continents • Allows pond to run up to 1000 virtual nodes • Have successfully run applications in wide-area • Created tools to allow quick deployment to PlanetLab Client Inner ring • NFS • Provides traditional file system support • Enables time travel (reverting files/dirs) through OceanStore’s versioning primitives Replicas • Many others • Palm pilot synchronizer, AFS, etc. Archival Storage Pond prototype benchmarks Application benchmarks Conclusions and future directions • OceanStore’s accomplishments • Major prototype completed • Several fully-functional Internet services built and deployed • Demonstrated feasibility of the approach • Published results on system’s performance • Collaborating with other global-scale research initiatives • NFS: Andrew benchmark • Client in Berkeley, server in Seattle • 4.6x than NFS in read-intensive phases • 7.3x slower in write-intensive phases • Reasonable time w/ key size of 512 • Signature time is the bottleneck • Object update latency • Measures latency of inner ring • Byzantine agreement commit time • Shows threshold signature is costly • 100 ms latency on object writes • Current research directions • Investigate new introspective data placement strategies • Finish adding features • Tentative update sharing between sessions • Archival repair • Replica management • Improve existing performance and deploy to larger networks • Examine bottlenecks • Improve stability • Data structure improvements • Develop more applications • MINNO: Login time • Client cache sync time w/ new msg retrieval • Measured time vs. latency to inner ring • Simulates mobile clients • MINNO adapts well with data migration andtentative commits enabled • Outperforms traditional IMAP server w/ noprocessing overhead • Object update throughput • Measures object write throughput • Base system provides 8 MBps • Batch updates to get good performance