150 likes | 158 Views
This paper discusses the architecture and goals of OceanStore, a global-scale persistent storage system. It covers data location and routing, deep archival storage, applications for persistent storage, and the current state of the system.
E N D
OceanStore:An Architecture for Global-Scale Persistent Storage Authors: J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao University of California, Berkeley http://oceanstore.cs.berkeley.edu
Presentation Overview • Purpose and Vision of OceanStore • Data Location and Routing • Deep Archival Storage • Current Status
Applications for Persistent Storage • Storage for ubiquitous computing • Need for transparency • Large inexpensive memory allows for this • Personal Information Management tools: • Calendars, Contact Lists, etc. • E-mail • Need consistency • Need privacy and security • Repositories, Digital Libraries
OceanStore Goals • OceanStore will accommodate persistent storage for ubiquitous computing. • Consistant • Highly Available • Durable Information • Divorced from location • Unique Goals • Levels of trusted and untrusted servers • Nomadic Data
Data Location and Routing • Routing is maintained as location independent by addressing GUIDs • Distributed data structure tracks the location of objects based on a Randomized Hierarchical Distributed Data Structure (Plaxton et al) • Routing is tiered • Local routing is probabilistic. • Backup is a highly redundant randomized hierarchical distributed data structure
GUID 4356 1 GUID 7382 1 Hash1(x) = 0 Hash2(x) = 3 Hash3(x) = 4 0 Hash1(x) = 2 Hash2(x) = 3 Hash3(x) = 4 0 0 1 1 1 1 1 0 0 0 0 0 0 Probabilistic Routing • Attenuated Bloom Filters • Multiple Hashes on the same data • Can give a false positive answer
Attenuated Bloom Filters • Union of neighbor-node filters yield a consistent hash. • Cheap and easy • Probabilistic
Wide-Scale Data Location • Bits in an object’s GUID becomes node IDs in a random hierarchical tree • Each link in the tree is graded by how much of the node ID’s match • L1 = No Match • L2 = LSB Match • Every level on a node has 16 links to closest ping IP’s.
Random Trees • Roots occur where highest level links occur • By traversing through greater than or equal to links that have the desired bit strings the desired node ID is found. • Only disjoint networks prevent object location
Deep Archival Storage • Assumed uncorrelated faults • Highly redundant fragments • Intelligently distributed to both trusted and untrusted systems
B1 B3 B1 B2 B3 B4 Using Erasure Codes P2 P3 B1 B2 B3 B4 Expensive Code Calculations P1 P2 P3 P4 B2 B4 Erasure Codes • Reed-Solomon Codes • Transforms n fragments into 2n or 4n fragments • Any set of n fragments from the larger set of fragments can help determine the data carried by the original n fragments.
1 0 1 1 1 b0 b1 b2 b3 p 1+0+1+1=3 %2=1 1 ? 1 1 1 b0 b1 b2 b3 p 1+1+1+1=4 %2=0 Smaller Example • Using Erasure Codes are similar to using parity bits in strings of bits.
Current State • Pond: a prototype system • Tapestry • Infrastructure for fault resilient, decentralized location and routing • Fast becoming a reality