1 / 31

Increasing Intrusion Tolerance Via Scalable Redundancy

Increasing Intrusion Tolerance Via Scalable Redundancy. Mike Reiter reiter@cmu.edu Natassa Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor. Technical Objective. To design, prototype and evaluate new protocols for implementing intrusion-tolerant services that scale better

una
Download Presentation

Increasing Intrusion Tolerance Via Scalable Redundancy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Increasing Intrusion Tolerance Via Scalable Redundancy Mike Reiter reiter@cmu.edu Natassa Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor

  2. Technical Objective • To design, prototype and evaluate new protocols for implementing intrusion-tolerant services that scale better • Here, “scale” refers to efficiency as number of servers and number of failures tolerated grows • Targeting three types of services • Read-write data objects • Custom “flat” object types for particular applications, notably directories for implementing an intrusion-tolerant file system • Arbitrary objects that support object nesting

  3. Expected Impact • Significant efficiency and scalability benefits over today’s protocols for intrusion tolerance • For example, for data services, we anticipate • At-least twofold latency improvement even at small configurations (e.g., tolerating 3-5 Byzantine server failures) over current best • And improvements will grow as system scales up • A twofold improvement in throughput, again growing with system size • Without such improvements, intrusion tolerance will remain relegated to small deployments in narrow application areas

  4. The Problem Space • Distributed services manage redundant state across servers to tolerate faults • We consider tolerance to Byzantine faults, as might result from an intrusion into a server or client • A faulty server or client may behave arbitrarily • We also make no timing assumptions in this work • An “asynchronous” system • Primary existing practice: replicated state machines • Offers no load dispersion, requires data replication, and degrades as system scales in terms of # messages

  5. Evaluation • Baseline for current work: the BFT library • Popular, publicly available implementation of Byzantine fault-tolerant state machine replication (by Castro & Liskov) • Reported to be an efficient implementation of that approach • Two measures • Average latency of operations, from client’s perspective • Peak sustainable throughput of operations • Our consistency definition: linearizability of invocations

  6. Background - Read/Write protocol • Servers provide read/write block interface • Servers version blocks on every write • Decentralized, optimistic, scalable, Byzantine fault-tolerant Servers D D D D D D D D Data block Client

  7. R/W semantics • R/W protocol appropriate for block storage • But R/W protocol inappropriate for building general services • Doesn’t provide replicated state machine semantics • A metadata service for a R/W-based block store motivated us to develop a protocol with stronger semantics

  8. Client A Directory Directory R/W semantics insufficient for metadata • Consider 2 clients inserting a file in the same directory • Last write wins; good for blocks, bad for directories D D D D D D D D D D D D Directory Directory Directory Client B

  9. Query/Update (Q/U) protocol • A protocol with replicated state machine semantics • Provides linearizable query and update operations • Protocol properties • Decentralized • Handles Byzantine clients & server failures, asynchronous • Efficient common case operation • Optimistic protocol leverages versioning servers • Single-phase queries and updates, if concurrency- and failure-free • Avoids expensive cryptography (digital signatures) • Scalable • Avoids server-to-server broadcast • Atomic multi-object updates

  10. Outline • Motivation • Query/Update protocol • Overview • Query, update operations • Validation, object syncing, multi-object operations • Evaluation

  11. Client A Directory Directory Read/conditional-write primitive • Servers accept an update operation only if the object hasn’t been modified since read directory D D D D D D D D Directory Directory Client B

  12. Directory Handling Byzantine clients • For Byzantine fault-tolerance, clients must pass operation to servers • Constrains clients to narrow object interface • Servers apply operation to old object to validate new object directory D D D D D D D D directory + op Op Op Op Op Directory Op

  13. Clients and objects • Client just sends operations • Client does not read/write object • Server applies operation to local object history D D D D D D D D op Op Op Op Op

  14. B 1 0 A 5 4 3 Query/Update protocol • Servers host objects • Optimistic protocol  versioning • Export an operation interface (more than read/write) • Can export any deterministic operation • Server exports three types of operations: Server Read History (object) Returns timestamp vector Query (Object,Version) Read-only; returns object state; e.g., getattr Update (Object, OHS, Value) Mutating; updates object, conditioned on object not having been modified; e.g., setattr C 9 8

  15. Outline • Motivation • Query/Update protocol • Overview • Query, update operations • Validation, object syncing, multi-object operations • Evaluation

  16. Read history operation • Client requests version history of an object • Each server replies with a list of timestamps 3 read-history history-reply 2 2 2 2 Time 2 1 1 1 1 1 Object History Set (OHS)

  17. 2 2 2 2 Latest Query operation • Client performs read history operation • Constructs OHS and identifies Latest version that is complete • Client queries Latest version at server 3 read-history history-reply 2 2 2 2 2 Time query 1 1 1 1 1 query-reply Object History Set (OHS)

  18. OHS OHS OHS OHS Latest Update operation • Client performs read-history operation • Constructs OHS and identifies Latest version that is complete • Client sends operation and OHS to servers • Operation is conditioned on OHS 3 3 3 3 read-history 3 history-reply 2 2 2 2 Time 2 update 1 1 1 1 1 update-reply Object History Set (OHS)

  19. Server validation for update operations • A server needs to verify that the client conditioned operation on Latest • Validation steps: • Ensure read/conditional-write semantics • Check that local history matches that in OHS • Classify Latest write version • Ensures operation is based on appropriate timestamp • Protection against Byzantine failures • Check authenticators • Ensures integrity of OHS

  20. 2 2 2 2 Server validation example • Earlier example of 2 clients concurrently updating same directory • Servers reject client B’s operation, due to “stale” OHS 3 read-history history-reply 2 Time update 1 1 1 1 1 Client B Client A

  21. Q/U protocol details • Handling Byzantine clients and server faults • Through validating timestamps and OHS • During classification of Latest, may require repair • Incomplete operations: use barriers to fix failures • Flexible protocol – can handle different types/# of faults • For asynchronous with Byzantine clients: • N = 3t + 2b + 1, to tolerate t server faults, b of which are Byzantine • Object syncing • Multi-object operations

  22. Object syncing • A server may not have the latest version of an object • If a server lacks latest version of object, the OHS contains information about which other servers have that version • The server must sync the object with another server • Hashes in OHS allow server to validate the synced object

  23. Multi-object operation • An update can span multiple objects • A client must construct OHS for each object • Servers perform validation for each object • Operations perform atomically across multiple objects

  24. Outline • Motivation • Query/Updateprotocol • Overview • Query, update operations • Validation, object syncing, multi-object operations • Evaluation

  25. Prototype evaluation • Built a counter object using Q/U and BFT protocols • incmethod increments counter and returns new value • fetchmethod returns current counter value • Light-weight operations to demonstrate network and computation overhead inherent to protocols • Both Q/U and BFT implement efficient, optimistic queries • Evaluation focuses on updates • Q/U common case: no concurrency; preferred quorums • BFT common case: shared counter to allow batching

  26. Experimental setup • Cluster of Pentium 4 2.8 GHz, 1GB RAM • 1 Gb switched Ethernet, 18.3 Gbps/35.7 mpps switch • No background traffic • Working size of experiments fit in server memory • To focus on protocol overhead, not on disk accesses • Experiments are run for 30 seconds • Measurements from middle 10 seconds

  27. Fault scalability (1) • Investigate throughput as the number of server faults (b) tolerated increases • Measured saturated throughput • Ran with 1, 3, 5, …, 20 clients with 2 outstanding reqs • For each b, selected highest throughput value

  28. Fault scalability (2)

  29. Throughput and response time under load (1) • Investigate throughput & response time under load • Demonstrates protocol behavior beyond saturated throughput data point • Increased number of clients from 1 to 20 for b = 1

  30. Throughput and response time under load (2)

  31. Conclusions • Developed the Q/U protocol for accessing shared objects in a distributed system • Fault-scalable • Byzantine fault-tolerant • Optimistic, efficient • Atomic multi-object operations • Evaluation • Protocol scales with number of failures tolerated • Throughput & response time consistent under load

More Related