190 likes | 198 Views
Explore the architecture of OceanStore designed for global-scale persistent storage, featuring connectivity to various devices, data safety, and cooperative utility model. The design highlights, system architecture, access control, data routing strategies, and more are discussed in detail.
E N D
Principal Resource OceanStore: An Architecture for Global-Scale Persistent Storage: John Kubiatowicz, DavicBindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, Ben Zhao. Proceedings of ACM ASPLOS 2000. Additional Materials http://oceanstore.cs.berkeley.edu/ www.cs.fsu.edu/~awang/courses/cop5611_s2006/oceanstore.ppt www.cs.fsu.edu/~awang/courses/cop5611_s2004/lecture_21_distributed_fs3.ppt Presenter Stanley Ziewacz April 10, 2008
Global-Scale Persistent Storage • Global Scale • 1010 users • 10,000 files per user • Population Clocks • U.S. 303,807,847 • World 6,660,028,725 • 16:12 GMT (EST+5) Apr 08, 2008 • Mole • 6.023 X 1023 atoms in exactly 12 grams of carbon-12
Persistent Data Storage • Connectivity to all types of computing devices • Desktop, laptop, palmtop, cellphone, etc. • Information Safety • Avoid prying eyes and survive malicious hands • Durable—1000 years? • Redundancy with continuous repair and redistribution for long-term storage • Uniform and highly-available access to data • Data divorced from location • Servers close to clients
Cooperative Utility Model • OceanStore Service Providers • Client pays one monthly bill to one company • Clients can use resources on other OSPs • OSPs buy and sell capacity among themselves • Millions of servers
Design Highlights • Infrastructure is only trusted in the aggregate • Servers may crash without warning • Servers may leak data • Data can be cached anywhere, anytime • Promiscuous caching • Nomadic Data
Applications • Groupware • Personal information managers • Digital libraries • Scientific data repositories
System Architecture • Naming • Data location and routing • Update model and conflict resolution • Deep archival storage • Introspection
Naming Globally Unique Identifiers GUIDs • Object GUID • Secure hash of owner’s key and human-readable name • Server GUID • Secure hash of server’s public key • Archival fragment GUID • Secure hash over the data it stores
Access Control • Restricting Readers • If data is not completely public • Encrypt • Distribute key to users with read permission • To revoke read permission • Request that replicas be deleted or re-encrypted • Restricted at clients • Restricting Writers • All writes are signed • Owner can provide access control lists for objects • Restricted at servers
Data Location and Routing • Support location-independent routing • Message routes to discover a destination • Then message routes directly to destination • Two routing strategies • Bloom filters, fast probabilistic algorithm • Plaxton-style routing
Bloom Filter • A Bloom filter • Represents a set S = {S1, … Sn} • Is depicted by a m bit array, filter[m] • Uses r independent hash functions • h1…hr • for i = 1…n • for j = 1…r • filter[hj[Si]] = 1 www.cs.fsu.edu/~awang/courses/cop5611_s2004/lecture_21_distributed_fs3.ppt
Bloom Filter Example • filter[] = {1, 1, 0, 1, 0, 1} • Does x belong to the set? • filter[h1(x)] = filter[0] = 1 • filter[h2(x)] = filter[3] = 1 • filter[h3(x)] = filter[5] = 1 • Does z belong to the set? • filter[h1(z)] = filter[2] = 0 no • filter[h2(z)] = filter[3] = 1 • filter[h3(z)] = filter[5] = 1 www.cs.fsu.edu/~awang/courses/cop5611_s2004/lecture_21_distributed_fs3.ppt
Attenuated Bloom Filters www.cs.fsu.edu/~awang/courses/cop5611_s2006/oceanstore.ppt
Variation on Plaxton Routing • Each object GUID has root node • Root ID matches GUID’s hash in the most bits • But replicas can be placed anywhere • Publishing process for replicas • Do Plaxton hops from replica location to root • Place a pointer to replica locale at each hop
Update Model • Client generates updates • Primary tier of replicas commit • Evaluate update’s predicates in time order • Perform action with earliest predicate
Updating Ciphertext • Only 4 predicates available • Compare-version • Compare-size • Compare-block • Search • Actions available • Replace-block • Insert-block • Delete-block • Append
Serializing Updates • Primary tier of replicas • Byzantine agreement protocol • Final commit order • Multicast committed updates • Tentative Updates • Sent to several random replicas • Tentative commits spread by epidemic
Deep Archival Storage • Objects exist in both active and archival form • Archival form is permanent read-only • Form treated as series of fragments of data • Fragments spread over the network structure • Use any n fragments to reconstruct data • In principle every version is archived
Introspection • Cluster recognition • Each client machine has an event handler triggered by each data access • Replica Management • Event handlers monitor client requests and system load • Detect periodic migration of clusters from site to site • OceanStore can monitor my work/travel routine