230 likes | 241 Views
Explore the significance of accountability in distributed systems to detect faults, identify nodes, and maintain accuracy. Learn about challenges, advantages, and a practical approach to achieve accountability without relying on a trusted entity. Discover how tamper-evident logs and witness nodes play a crucial role in ensuring system reliability and integrity.
E N D
PeerReview: Practical Accountability for Distributed Systems SOSP 07
Why have Accountability? • Nodes can fail • An attacker can compromise a node • Accidental Mis-configuration • Multiple administrative domains
Distributed state, incomplete information • General case: Multiple admins with different interests Admin www.sosp2007.org/talks/sosp118-haeberlen.ppt
What is Accountability? • Fault = Anything besides expected behavior • Ideal Accountability: • Detect a fault • Identify the faulty node (Completeness) • Correct node can prove its correctness (Accuracy) • Expose the faulty node
A few advantages: • Deterring faults • Augment fault tolerant systems • Augmenting best-effort systems
Challenges: What can/cannot be detected? • Un-observable faults: • Node’s internal state • CPU overheating, Display failed • Need trusted probes! • Observable faults: • Affect a correct node causally • No trusted entity required! • How to verify if a node reports correctly? • How to distinguish omission from long delays?
Request • Grant • Release
Accountability: How much can we do? • Completeness: • Eventually suspected • Eventually exposed • Accuracy • No correct node is forever suspected • No correct node ever exposed by a correct node
FullReview • Characteristics: • A trusted entity exists • All messages go through trusted entity • Each node maintains a log for every other node • Check the log • Suspect/Expose a deviant node • Complete? • Accurate? • Practical?
PeerReview:Practical Accountability • No trusted entity • Nodes only keep their own log • May retrieve others when needed • Logs are tamper-evident • Witness nodes: check correctness of a node • Challenge/Response protocol
System Model • Each node modeled as: • A state machine • A detector • An application • Assumptions: • Deterministic state machine • Correct nodes can communicate • A reference implementation of node SW • A secure signature mechanism available
Overview • Nodes maintain a log of I/O • Witnesses of a node audit its log • If faulty, gather evidence • Make it known
Tamper-evident logs • Append-only list of I/O • Log-entries connected in a hash-chain • Authenticator: A signed statement by a node • If a node tampers the log, it will be evident • Logs must be complete • No entries missed • Logs must be correct • No forged entries • No multiple logs
Module A State machine Module B Network Log Module A Module B Input if ≠ =? Output Fault Detection • Audit • Replay the inputs to a reference implementation • Output == Log ? • Evidence Transfer • Fetch evidence from witnesses
PeerReview: Applications • Overlay Multicast • Large amounts of data • Freeloaders • Network File System • Latency-sensitive • Data tampering • Message loss in the network • Peer-to-peer email • DoS attack
Discussion • What if all witnesses are faulty? • How to choose Ttrunc, Taudit, Tbuf