400 likes | 416 Views
Discover the futuristic vision of personal information management as a utility service on a global scale. Explore challenges, observations, and solutions for secure, resilient, and wide-scale data storage in the OceanStore context.
E N D
OceanStoreGlobal-Scale Persistent Storage Ying Lu CSCE496/896 Spring 2011
Give Credits • Many slides are from John Kubiatowicz, University of California at Berkeley • I have modified them and added new slides
Motivation • Personal Information Mgmt is the Killer App • Not corporate processing but management, analysis, aggregation, dissemination, filtering for the individual • Automated extraction and organization of daily activities to assist people • Information Technology as a Utility • Continuous service delivery, on a planetary-scale, on top of a highly dynamic information base
OceanStore Context: Ubiquitous Computing • Computing everywhere: • Desktop, Laptop, Palmtop, Cars, Cellphones • Shoes? Clothing? Walls? • Connectivity everywhere: • Rapid growth of bandwidth in the interior of the net • Broadband to the home and office • Wireless technologies such as CDMA, Satellite, laser • Rise of the thin-client metaphor: • Services provided by interior of network • Incredibly thin clients on the leaves • MEMS devices -- sensors+CPU+wireless net in 1mm3 • Mobile society: people move and devices are disposable
Questions about information: • Where is persistent information stored? • 20th-century tie between location and content outdated • How is it protected? • Can disgruntled employee of ISP sell your secrets? • Can’t trust anyone (how paranoid are you?) • Can we make it indestructible? • Want our data to survive “the big one”! • Highly resistant to hackers (denial of service) • Wide-scale disaster recovery • Is it hard to manage? • Worst failures are human-related • Want automatic (introspective) diagnose and repair
First Observation:Want Utility Infrastructure • Mark Weiser from Xerox: Transparent computing is the ultimate goal • Computers should disappear into the background • In storage context: • Don’t want to worry about backup, obsolescence • Need lots of resources to make data secure and highly available, BUT don’t want to own them • Outsourcing of storage already very popular • Pay monthly fee and your “data is out there” • Simple payment interface one bill from one company
Second Observation:Need wide-scale deployment • Many components with geographic separation • System not disabled by natural disasters • Can adapt to changes in demand and regional outages • Wide-scale use and sharing also requires wide-scale deployment • Bandwidth increasing rapidly, but latency bounded by speed of light • Handling many people with same system leads to economies of scale
OceanStore:Everyone’s data, One big Utility “The data is just out there” • Separate information from location • Locality is only an optimization (an important one!) • Wide-scale coding and replication for durability • All information is globally identified • Unique identifiers are hashes over names & keys • Single uniform lookup interface • No centralized namespace required
Amusing back of the envelope calculation(courtesy Bill Bolotsky, Microsoft) • How many files in the OceanStore? • Assume 1010 people in world • Say 10,000 files/person (very conservative?) • So 1014 files in OceanStore! • If 1 gig files (not likely), get 1 mole of files! Truly impressive number of elements… … but small relative to physical constants
Utility-based Infrastructure Canadian OceanStore • Service provided by confederation of companies • Monthly fee paid to one service provider • Companies buy and sell capacity from each other Sprint AT&T IBM Pac Bell IBM
Outline • Motivation • Properties of the OceanStore • Specific Technologies and approaches: • Naming and Data Location • Conflict resolution on encrypted data • Replication and Deep archival storage • Introspective computing for optimization and repair • Economic models • Conclusion
Ubiquitous Devices Ubiquitous Storage • Consumers of data move, change from one device to another, work in cafes, cars, airplanes, the office, etc. • Properties REQUIRED for OceanStore storage substrate: • Strong Security: data encrypted in the infrastructure; resistance to monitoring and denial of service attacks • Coherence:too much data for naïve users to keep coherent “by hand” • Automatic replica management and optimization:huge quantities of data cannot be managed manually • Simple and automatic recovery from disasters: probability of failure increases with size of system • Utility model: world-scale system requires cooperation across administrative boundaries
OceanStore Technologies I:Naming and Data Location • Requirements: • System-level names should help to authenticate data • Route to nearby data without global communication • Don’t inhibit rapid relocation of data • OceanStore approach: Two-level search with embedded routing • Underlying namespace is flat and built from secure cryptographic hashes (160-bit SHA-1) • Search process combines quick, probabilistic search with slower guaranteed search
Floating Replica Universal Name Active Data Name OID Version OID Global Object Resolution Commit Logs Checkpoint OID Root Structure Update OID: Archive versions: Version OID1 Version OID2 Version OID3 Global Object Resolution Global Object Resolution Global Object Resolution Archival copy or snapshot Archival copy or snapshot Archival copy or snapshot Erasure Coded: Universal Location Facility • Takes 160-bit unique identifier (GUID) and Returns the nearest object that matches
Routing Two-tiered approach • Fast probabilistic routing algorithm • Entities that are accessed frequently are likely to reside close to where they are being used (ensured by introspection) • Slower, guaranteed hierarchical routing method Self-optimizing
01234 bit 01234 bit 11011 reliable factors 11011 11010 X M Y z 10 1st 1st 11100 11100 1st 11011 10 2nd 2nd 11011 11011 (0,1,3) (1,3,4) (0,2,4) (0,1,4) 10101 11100 11100 11010 11001 1st 00011 Query for X (11010) reliable factors 00011 00011 1st 11000 100 2nd 00100 11011 100 3rd 11010 100 Probabilistic RoutingAlgorithm self-optimizing on the depth of the attenuated bloom filter array n3 n1 n2 n4 Bloom filter on each node; Attenuated Bloom filter on each directed edge.
Hierarchical RoutingAlgorithm • Based on Plaxton scheme • Every server in the system is assigned a random node-ID • Object’s root • each object is mapped to a single node whose node-ID matches the object’s GUID in the most bits (starting from the least significant) • Information about the GUID (such as location) were stored at its root
1 x927 x927 1 x431 x431 1 2 0265 0265 1 x633 x633 1 x742 x742 9834 9834 1215 1215 2 3 3714 3714 1624 1624 2344 2344 2 3 5724 5724 7144 7144 4 Construct Plaxton Mesh 0324 1324 …
GUID 0x43FE 3 4 2 NodeID 0x79FE NodeID 0x23FE NodeID 0x993E NodeID 0x43FE NodeID 0x43FE 1 4 NodeID 0x73FE NodeID 0x44FE 3 2 1 3 NodeID 0xF990 4 4 3 2 NodeID 0x035E NodeID 0x04FE 3 NodeID 0x13FE 4 NodeID 0x555E NodeID 0xABFE 2 NodeID 0x9990 3 1 2 1 2 3 NodeID 0x239E NodeID 0x73FF NodeID 0x1290 NodeID 0x423E 1 Basic Plaxton MeshIncremental suffix-based routing e d c b a
OceanStore Enhancements of the Plaxton Mesh • Documents have multiple roots (Salted hash of GUID) • Each node has multiple neighbor links • Searches proceed along multiple paths • Tradeoff between reliability, performance and bandwidth? • Dynamic node insertion and deletion algorithms • Continuous repair and incremental optimization of links self-healing self-optimizing self-configuration
OceanStore Technologies II:Rapid Update in an Untrusted Infrastructure • Requirements: • Scalable coherence mechanism which can operate directly on encrypted datawithout revealing information • Handle Byzantine failures • Rapid dissemination of committed information • OceanStore Approach: • Operations-based interface using conflict resolution • Modeled after Xerox Bayou updates packets include:Predicate/action pairs which operate on encrypted data • User signs Updates and principle party signs commits • Committed data multicast to clients
Update Model • Concurrent updates w/o wide-area locking • Conflict resolution • Updates Serialization • A master replica? • Role of primary tier of replicas • All updates submitted to primary tier of replicas which chooses a final total order by following Byzantine agreement protocol • A secondary tier of replicas • The result of the updates is multicast down the dissemination tree to all the secondary replicas
Agreement Need agreement in DS: Leader, commit, synchronize Distributed Agreement algorithm: all non-faulty processes achieve consensus in a finite number of steps Perfect processes, faulty channels: two-army Faulty processes, perfect channels: Byzantine generals
Possible Consensus Agreement is possible in synchronous DS [e.g., Lamport et al.] Messages can be guaranteed to be delivered within a known, finite time. Byzantine Generals Problem A synchronous DS: can distinguish a slow process from a crashed one
Byzantine Generals Problem
Byzantine Generals -Example (1) The Byzantine generals problem for 3 loyal generals and1 traitor. The generals announce the time to launch the attack (by messages marked by their ids). The vectors that each general assembles based on (a) The vectors that each general receives, where every general passes his vector from (b) to every other general.
Byzantine Generals –Example (2) The same as in previous slide, except now with 2 loyal generals and one traitor.
Byzantine Generals Given three processes, if one fails, consensus is impossible Given N processes, if F processes fail, consensus is impossible if N 3F
Data Coding Model • Two distinct forms of data: active and archival • Active Data in Floating Replicas • Latest version of the object • Archival Data in Erasure Coded Fragments • A permanent, read-only version of the object • During commit, previous version coded with erasure-code and spread over 100s or 1000s of nodes • Advantage: any 1/2 or 1/4 of fragments regenerates data
Full Copy Full Copy Full Copy Ver1: 0x34243 Ver2: 0x49873 Ver3: … Ver1: 0x34243 Ver2: 0x49873 Ver3: … Ver1: 0x34243 Ver2: 0x49873 Ver3: … Conflict Resolution Logs Conflict Resolution Logs Conflict Resolution Logs Floating Replica Erasure-coded Fragments Floating Replica and Deep Archival Coding
Proactive Self-Maintenance • Continuous testing and repair of information • Slow sweep through all information to make sure there are sufficient erasure-coded fragments • Continuously reevaluate risk and redistribute data • Slow sweep and repair of metadata/search trees • Continuous online self-testing of HW and SW • Detects flaky, failing, or buggy components via: • fault injection:triggering hardware and software error handling paths to verify their integrity/existence • stress testing: pushing HW/SW components past normal operating parameters • scrubbing: periodic restoration of potentially “decaying” hardware or software state • Automates preventive maintenance
OceanStore Technologies IV:Introspective Optimization • Requirements: • Reasonable job on global-scale optimization problem • Take advantage of locality whenever possible • Sensitivity to limited storage and bandwidth at endpoints • Repair of data structures, increasing of redundancy • Stability in chaotic environment Active Feedback • OceanStore Approach: • Introspective monitoring and analysis of relationships to cluster information by relatedness • Time series-analysis of user and data motion • Rearrangement and replication in response to monitoring • Clustered prefetching: fetch related objects • Proactive-prefetching: get data there before needed • Rearrangement in response to overload and attack
Example: Client Introspection • Client observer and optimizer components • Greedy agents working on the behalf of the client • Watches client activity/combines with historical info • Performs clustering and time-series analysis • Forwards results to infrastructure (privacy issues!) • Monitoring state of network to adapt behaviour • Typical Actions: • Cluster related files together • Prefetch files that will be needed soon • Create/destroy floating replicas
OceanStore Conclusion • The Time is now for a Universal Data Utility • Ubiquitous computing and connectivity is (almost) here! • Confederation of utility providers is right model • OceanStore holds all data, everywhere • Local storage is a cache on global storage • Provides security in an untrusted infrastructure • Exploits economies of scale to: • Provide high-availability and extreme survivability • Lower maintenance cost: • self-diagnosis and repair • Insensitivity to technology changes:Just unplug one set of servers, plug in others