250 likes | 389 Views
Increasing Intrusion Tolerance Via Scalable Redundancy. Michael Reiter reiter@cmu.edu Anastasia Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor. The Problem Space. Distributed services manage redundant state across servers to tolerate faults
E N D
Increasing Intrusion Tolerance Via Scalable Redundancy Michael Reiter reiter@cmu.edu Anastasia Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor
The Problem Space • Distributed services manage redundant state across servers to tolerate faults • We consider tolerance to Byzantine faults, as might result from an intrusion into a server or client • A faulty server or client may behave arbitrarily • We also make no timing assumptions in this work • An “asynchronous” system
Our Goals • To design, implement and evaluate new protocols for implementing intrusion-tolerant services that scale better • Here, “scale” refers to efficiency as number of servers and number of failures tolerated grows • Targeting three types of services • Read-write data objects • Custom “flat” object types for particular applications, notably directories for implementing an intrusion-tolerant file system • Arbitrary objects that support object nesting
Expected Impact • Significant efficiency and scalability benefits over today’s approaches to intrusion tolerance • For example, for data services, we anticipate • At-least twofold latency improvement even at small configurations (e.g., tolerating 3-5 Byzantine server failures) over current best • And improvements will grow as system scales up • A twofold improvement in throughput, again growing with system size • Without such improvements, intrusion tolerance will remain relegated to small deployments in narrow application areas
Outline • Concepts • Challenges • Techniques • Systems • Technology transfer
Service, or object, abstraction Implementation Concepts: Distributed Services push pop sort invocation response
Concepts: Linearizability [Herlihy & Wing 1991] • A strong and accepted semantics for shared objects • mimics semantics of a centralized object implementation • each methodappears to be executed at a distinct point between its invocation and response time c1 Object invocations c2 Apparent execution
inv inv inv inv inv inv inv inv inv inv inv inv inv inv inv Concepts: State Machine Replication • Offers no load dispersion, and degrades as system scales Servers inv inv inv
Concepts: Wait-Freedom [Herlihy 1990] • A liveness property for object invocations • Informally, an implementation is wait-free if any client’s operation is guaranteed to complete • Assuming a limit on the number of faulty servers [Jayanti et al.] • But not assuming a limit on the number of faulty clients • Intuitively, wait-freedom precludes synchronization mechanisms that must be “unlocked” by a client • Only read-write objects can be implemented in a wait-free way • Virtually any other object cannot (in an asynchronous system)
Challenges: Concurrency • Concurrent updates can violate linearizability Servers 1 2 3 4 5 1 2 3 4 5 Data Data
Challenges: Server Failures • Can attempt to mislead clients • Typically addressed by “voting” Servers 1 2 3 4 5 4’ ????
Challenges: Client Failures • Byzantine client failures can also mislead clients • Typically addressed by submitting a request via an agreement protocol Servers 1 2 2’ 3 4 4’ 5 ? Data ?
Challenges: Object Nesting • Distributed objects have stubs and replicas Servers
T0 D0 D0 D0 T1 Client read operation after T1 T1 Techniques: Versioning 3 writes required Ø Ø Ø Ø Ø Time D1 1 2 3 4 5 D0 D1 Ø Ø D0 D0 D0 determined complete, returned D0 latest candidate D1 latest candidate D1 incomplete
T0 T1 T2 D2 D2 D2 Client read operation after T2 T2 Techniques: Repair Ø Ø Ø Ø Ø D0 D0 D0 Time D1 D2 D2 1 2 3 4 5 D0 D2 D1 D2 Unreachable D2 D2 D2 latest candidate D2 unclassifiable Return D2 Repair D2
Techniques: Quorum Systems • A quorum system is a data redundancy technique that supports load dispersion among servers • Only a subset of servers are accessed in each operation Ex: Grid with n=49, b=3
Techniques: Cross Checksums [Gong 1989] • A mechanism for defending against Byzantine servers that attempts to alter data in their possession • Each data fragment is appended with a hash of all data fragments • When retrieved, hashes are used as “votes” to determine correct data fragments Data-fragments Hashes Data-item Cross checksum
All data-fragments Hashes Data-item Cross checksum Hash in timestamp Techniques: Validating Timestamps • A technique for defending against Byzantine clients that attempt to write different data values at the same timestamp • Cross-checksum of write value recorded in its timestamp • Read results are used to regenerate all data fragments and compare them to the timestamp Timestamp Read results
Techniques: Replicated Invocation • b stub replicas cannot invoke > b stub replicas can
Our Research • To summarize, we will explore the use of these techniques for implementing • Read-write block storage (linearizable, wait-free) • Specialized metadata objects (e.g., directories) necessary to construct a fully functional file system (linearizable) • A general framework for arbitrary deterministic objects (linearizable) • Not all techniques will be appropriate for all cases • “Flat” objects as found in file systems will generally not utilize replicated clients • Nested objects may not benefit from versioning (TBD)
Systems: PASIS • PASIS is a survivable storage system developed in a DARPA IPTO project • Funding ended December 2003 • Examined the use of encoding schemes for efficiently distributing data storage while protecting confidentiality/integrity • Did not address concurrency control • Clients would have to handle explicitly, e.g., using locking • Explored use of versioning for other purposes: recovery from user mistakes, system failures, penetrations • Showed viability of comprehensive versioning
Systems: Fleet • Fleet is a Java-based distributed object architecture developed in previous projects in DARPA ATO • Funding ended June 2004 • Focused on the use of quorum systems for efficient object replication • Fleet does not support nested objects and nested method invocations • Nor does it support potentially faulty clients
Technology Transition • Two primary channels are the industry consortia of two research centers at Carnegie Mellon: CyLab and the Parallel Data Lab • CyLab • A center focused on trustworthy and measurable computing • Founded in 2003 through the merger of the Center for Computer and Communications Security and the Sustainable Computing Consortium • Corporate affiliate program includes over fifty companies, including defense suppliers, tech companies and IT-based critical infrastructures • Parallel Data Lab • A ten-year-old center focused on storage infrastructures • Corporate affiliates include most major storage vendors • Both have a track record of technology transfer