1 / 30

SRG

SRG. PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07. Problems. How to: Detect Byzantine faults whose effects are observed by a correct node. Link faults to faulty nodes.

crystalp
Download Presentation

SRG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07

  2. Problems • How to: • Detect Byzantine faults whose effects are observed by a correct node. • Link faults to faulty nodes. • Defend correct nodes against false accusations.

  3. Accountability • Use accountability to detect and expose node faults. • Maintain a tamper-evident record that captures all actions of each node. • Detect a faulty node when it’s behavior deviates from that of a correct node.

  4. Limitations of current systems • Designed for a specific type of faults or for a specific application. • Based on many strong assumptions. • Not provide verifiable evidence of misbehavior. • Use formal specification of a system to check for misbehavior. • Can only detect faulty nodes that misbehave repeatedly.

  5. Overview • Model a node as a deterministic state machine. • Each node keeps a secure log that records all sent and received messages, all inputs and outputs. • To check a node j, node i will: • Get j’s log. • Replay j’s log using a reference implementation. • Compare the results.

  6. The problem of detection • Ideal completeness: a faulty node should be exposed by all correct nodes. • Ideal accuracy: no correct node is ever exposed by a correct node (no false positives).

  7. Types of faults can be detected • Available data: messages sent and received among nodes. • Can only detect faults that manifest themselves through messages. • Can only detect faults that are observed by a correct nodes. • Need to consider: • Verifiability of outputs. • Missing and long delayed messages.

  8. Problem statement • Terms: • Detectably fault, detectably ignorant. • Accomplices (of i): nodes that send messages caused by an incorrect message sent by i • Completeness: • Eventually, every detectably ignorant node is suspected forever by every correct node. • If node i is detectably faulty, then eventually, some faulty accomplice is exposed or suspected forever by every correct node.

  9. Problem statement (cont) • Accuracy: • No correct node is forever suspected by a correct node. • No correct node is ever exposed by a correct node.

  10. System model • Failure indications: • exposed(j) • suspected(j) • trusted(j)

  11. Assumptions • The state machines Si are deterministic. • A message sent from a correct node to another is eventually received. • Use a hash function H() that is: pre-image resistant, second pre-image resistant, and collision resistant. • Each node has a unique identifier. Nodes can sign messages, and faulty nodes can node forge the signature.

  12. Assumptions (cont) • Each node has access to a reference implementation of all Si. The implementation can take a snapshot and can be initialized from a snapshot. • Function ω that maps each node to a set of witnesses. The set {i} U ω(i) contains at least one correct node.

  13. Tamper-evident logs • Log entry • Hash value • Authenticator • If a prefix of a node’s log does not match the hash value then that node is faulty

  14. Tamper-evident logs • αkj can be used to check if j’s log contains ek • To inspect x entries of j: • i challenge j to return ek-(x-1),… ek and hk-x. • i calculate hk and compare with the value in the authenicator.

  15. Commitment protocol • To ensure that a node can not add an entry for a message it has never received and that a node’s log is complete. • When i send a message to j: • i creates (sk,SEND,{j,m}), attach hk-1, sk and σi(sk||hk) to m and send m. • j calculate the signature, if valid then j creates (sl, RECV,{i,m}) and retusn ACK to i with hl-1, sl and σj(sl||hl). • i verify the signature and send a challenge to j’s witnesses if the signature is not valid.

  16. Consistency protocol • A faulty node can hide itself by keeping more than one log or a log with multiple branches

  17. Consistency protocol • If i receives authenticators from j, it must eventually forward those authenticators to j’s witnesses. • Periodically, each ω of j’s witnesses will challenge j to return a list of entries (from k to l) then ω check for consistency. • Finally, ω extracts all authenticators j receives from other nodes and send them to corresponding witness sets.

  18. Audit protocol • To check if the node’s behavior consistent with it’s reference implementation. • Each witness of i will: • Look up the most recent authenticator of i. • Challenge to get all log entries since the last audit and add them to λωi. • Create an instance of i’s reference implementation, initialize the most recent snapshot. • Replay all the inputs and compare the outputs. • Expose i if the outputs are not equal.

  19. Challenge/response protocol • Audit challenge: • Consists two authenticators αki and αli (k < l) • i’s log must contains ek – el, otherwise faulty • If i is correct, returns the corresponding log segment.

  20. Challenge/response protocol • Send challenge: • Consists the message m with all needed information attached. • i must acknowledge m, otherwise faulty. • If i has not yet received m, accepts m and returns an ACK. • If i has already received m, just resends the ACK.

  21. Evidence transfer protocol • To ensure that all correct nodes eventually collect the same evidence against faulty nodes. • Every node i periodically fetches challenges collected by witnesses of every other node j. • If a correct node i obtains a challenge for j, i indicates suspected(j). When I receives a message from j, i challenges j. • If i receives valid answers to all pending challenges of j, i indicates trusted(j). • If i obtains a misbehavior from j, i indicates exposed(j).

  22. Overhead • Signing messages. • Extra messages to implement the protocols. • Taking snapshots of nodes. • Replay nodes’ execution

  23. Extension • Pf : probability that an all-faulty witness set exists. • Pm: probability that a given instance of misbehavior remains undetected. • The message complexity grows with O(logN).

  24. Applications • Overlay multicast. • NFS • P2P email (ePOST)

  25. Evaluation • Strategy of the freeloader in Overlay Multicast.

  26. Evaluation (cont) • Message latency in NFS

  27. Evaluation (cont) • Throughput of NFS

  28. Evaluation (cont) • Average traffic in ePOST

  29. Evaluation (cont) • Scalability

  30. Evaluation (cont) • Scalability

More Related