150 likes | 286 Views
What is OceanStore?. Worldwide infrastructure to give access to persistent data Assumptions Always online, fast connections (not a demand) The network is untrusted (malicious hosts) Hosts / routers can fail arbitrarily Hosts constantly entering / leaving.
E N D
What is OceanStore? • Worldwide infrastructure to give access to persistent data • Assumptions • Always online, fast connections (not a demand) • The network is untrusted (malicious hosts) • Hosts / routers can fail arbitrarily • Hosts constantly entering / leaving • 10^10 users with 10.000 files each • Goals: Durability, Availability, Enc. & Auth, High performance
What does Tapestry do for OS? • Routing messages to nodes and objects • Locating nodes and objects • Publishing objects
OS Data Model • Fundamental data unit: the object • Objects are either active or archived • AGUID: Hash of name + owner’s public key • BGUID: Hash of data block • Data blocks (leaves) are read-only and shared between object versions • Updates creates new versions which enables Time travel
OS Actions (on objects) • An application using OS could want to: • Store • Read the latest version • Read a previous version • Update (create new version)
How a client stores an object ”Disco.mp3” B98C5D… + PUK = AGUID A1F39B… 75DD2E…
How a client stores an object (2) Disco.mp3 Disco.mp3 Primary Replicas / Primary Tier / Inner Ring is chosen by a Responsible Party 1 1 1 1 Disco.mp3 Disco.mp3 RN ”Disco.mp3” AGUID ”Disco.mp3 _” AGUID … RN ”Disco.mp3 _ _” AGUID … RN ”Disco.mp3 _ _ _” AGUID … RN
Why have a primary replica? PR AGUID Latest VGUID - AGUID VGUID mapping: 1 - Serialize concurrent updates: 2 PR 3 - Access control: Reads: decryption key PR ACL Writes: signed certificate 1 1 3f + 1 servers • Consistency protocol for updates • (Byzantine Agreement Protocol): O(n^2) messages 1 1
How a client reads the latest version of an object ”Disco.mp3” B98C5D… + PUK = AGUID C RN 1 . . 2/3 of PR servers C 1 RN AGUID VGUID Transfer blocks to client
How a client reads a previous version AGUID ….. VGUIDn-1 VGUIDn-2 VGUID1 Fragments … Deep Archival Storage Fragments created via Erasure Coding
Deep Archival Storage (Erasure Coding) m = 4 n = 8 ”Disco.mp3” ….. Reconstruct block using m arbitrary fragments Rate of encoding r = m/n Increase in storage space: 1/r Tradeoff between performance and durability
Deep Archival Storage (2) • Example: 1 million machines, 10% down, p = probability of finding a document: • Replication p = 0.99 • EC, m = 8, n = 16 p = 0.99999 • EC, m = 16, n = 32 p = 0.9999999999…. (20000 nines)
Cached objects 1 1 - Known as Secondary Replicas - Old versions can also be cached 1 1 - No consistency protocol 2 - No serialization of concurrent updates 2 2 2 - Soft state - Increase performance and avail., but only if the application requires a lesser degree of consistency 2 2 transfer C
Example Applications • Distributed backup • Extremely high durability • Time travel • Groupware • Serialization of concurrent updates • Time travel • Email • Serialization of concurrent updates • Encryption and authentication • No single point of failure • Caching of mails close to clients