320 likes | 364 Views
SRG. PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07. Problems. How to: Detect Byzantine faults whose effects are observed by a correct node. Link faults to faulty nodes.
E N D
SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07
Problems • How to: • Detect Byzantine faults whose effects are observed by a correct node. • Link faults to faulty nodes. • Defend correct nodes against false accusations.
Accountability • Use accountability to detect and expose node faults. • Maintain a tamper-evident record that captures all actions of each node. • Detect a faulty node when it’s behavior deviates from that of a correct node.
Limitations of current systems • Designed for a specific type of faults or for a specific application. • Based on many strong assumptions. • Not provide verifiable evidence of misbehavior. • Use formal specification of a system to check for misbehavior. • Can only detect faulty nodes that misbehave repeatedly.
Overview • Model a node as a deterministic state machine. • Each node keeps a secure log that records all sent and received messages, all inputs and outputs. • To check a node j, node i will: • Get j’s log. • Replay j’s log using a reference implementation. • Compare the results.
The problem of detection • Ideal completeness: a faulty node should be exposed by all correct nodes. • Ideal accuracy: no correct node is ever exposed by a correct node (no false positives).
Types of faults can be detected • Available data: messages sent and received among nodes. • Can only detect faults that manifest themselves through messages. • Can only detect faults that are observed by a correct nodes. • Need to consider: • Verifiability of outputs. • Missing and long delayed messages.
Problem statement • Terms: • Detectably fault, detectably ignorant. • Accomplices (of i): nodes that send messages caused by an incorrect message sent by i • Completeness: • Eventually, every detectably ignorant node is suspected forever by every correct node. • If node i is detectably faulty, then eventually, some faulty accomplice is exposed or suspected forever by every correct node.
Problem statement (cont) • Accuracy: • No correct node is forever suspected by a correct node. • No correct node is ever exposed by a correct node.
System model • Failure indications: • exposed(j) • suspected(j) • trusted(j)
Assumptions • The state machines Si are deterministic. • A message sent from a correct node to another is eventually received. • Use a hash function H() that is: pre-image resistant, second pre-image resistant, and collision resistant. • Each node has a unique identifier. Nodes can sign messages, and faulty nodes can node forge the signature.
Assumptions (cont) • Each node has access to a reference implementation of all Si. The implementation can take a snapshot and can be initialized from a snapshot. • Function ω that maps each node to a set of witnesses. The set {i} U ω(i) contains at least one correct node.
Tamper-evident logs • Log entry • Hash value • Authenticator • If a prefix of a node’s log does not match the hash value then that node is faulty
Tamper-evident logs • αkj can be used to check if j’s log contains ek • To inspect x entries of j: • i challenge j to return ek-(x-1),… ek and hk-x. • i calculate hk and compare with the value in the authenicator.
Commitment protocol • To ensure that a node can not add an entry for a message it has never received and that a node’s log is complete. • When i send a message to j: • i creates (sk,SEND,{j,m}), attach hk-1, sk and σi(sk||hk) to m and send m. • j calculate the signature, if valid then j creates (sl, RECV,{i,m}) and retusn ACK to i with hl-1, sl and σj(sl||hl). • i verify the signature and send a challenge to j’s witnesses if the signature is not valid.
Consistency protocol • A faulty node can hide itself by keeping more than one log or a log with multiple branches
Consistency protocol • If i receives authenticators from j, it must eventually forward those authenticators to j’s witnesses. • Periodically, each ω of j’s witnesses will challenge j to return a list of entries (from k to l) then ω check for consistency. • Finally, ω extracts all authenticators j receives from other nodes and send them to corresponding witness sets.
Audit protocol • To check if the node’s behavior consistent with it’s reference implementation. • Each witness of i will: • Look up the most recent authenticator of i. • Challenge to get all log entries since the last audit and add them to λωi. • Create an instance of i’s reference implementation, initialize the most recent snapshot. • Replay all the inputs and compare the outputs. • Expose i if the outputs are not equal.
Challenge/response protocol • Audit challenge: • Consists two authenticators αki and αli (k < l) • i’s log must contains ek – el, otherwise faulty • If i is correct, returns the corresponding log segment.
Challenge/response protocol • Send challenge: • Consists the message m with all needed information attached. • i must acknowledge m, otherwise faulty. • If i has not yet received m, accepts m and returns an ACK. • If i has already received m, just resends the ACK.
Evidence transfer protocol • To ensure that all correct nodes eventually collect the same evidence against faulty nodes. • Every node i periodically fetches challenges collected by witnesses of every other node j. • If a correct node i obtains a challenge for j, i indicates suspected(j). When I receives a message from j, i challenges j. • If i receives valid answers to all pending challenges of j, i indicates trusted(j). • If i obtains a misbehavior from j, i indicates exposed(j).
Overhead • Signing messages. • Extra messages to implement the protocols. • Taking snapshots of nodes. • Replay nodes’ execution
Extension • Pf : probability that an all-faulty witness set exists. • Pm: probability that a given instance of misbehavior remains undetected. • The message complexity grows with O(logN).
Applications • Overlay multicast. • NFS • P2P email (ePOST)
Evaluation • Strategy of the freeloader in Overlay Multicast.
Evaluation (cont) • Message latency in NFS
Evaluation (cont) • Throughput of NFS
Evaluation (cont) • Average traffic in ePOST
Evaluation (cont) • Scalability
Evaluation (cont) • Scalability