260 likes | 401 Views
Accountable distributed systems and the accountable cloud. Peter Druschel joint work with Andreas Haeberlen 1 , Petr Kuznetsov 2 , Rodrigo Rodrigues 1 University of Pennsylvania 2 TU Berlin/Deutsche Telekom Labs. Outline. Why accountability? A definition
E N D
Accountable distributed systems and the accountable cloud Peter Druschel joint work with Andreas Haeberlen1, PetrKuznetsov2, Rodrigo Rodrigues 1 University of Pennsylvania 2 TU Berlin/Deutsche Telekom Labs Building and Programming the Cloud, Mysore, Jan 2010
Outline • Why accountability? • A definition • A practical implementation: PeerReview • Accountability in the Cloud • Technical Challenges • Conclusion Building and Programming the Cloud, Mysore, Jan 2010
Whatistheproblem? • Multiple administrative domains (federated, p2p) • Multiple stakeholders (hosting, Web) • different actors, somewhat different interests • lack of global visibility, control • Complex faults • software faults, mis-configuration, negligence, disgruntled employees, outside attacks, manipulation • Lack of transparency Building and Programming the Cloud, Mysore, Jan 2010
Learning fromthe 'offline' world • Reliesheavily on accountabilityto deal withfaults, misbehavior • Example: Banking • Recordcanbeusedto (manually) • detectproblems • identifytheresponsibleparty • convincethat a problemdoes (not) exist Building and Programming the Cloud, Mysore, Jan 2010
What does accountability mean in distributed systems? • Tamper-evident recordofeachnode‘sactions • (Automated) auditfor fault detection, localization • Evidencetoconvince a thirdpartythat a fault has (not) occured • Accountability provides • transparency • trust • incentives to avoid faults Building and Programming the Cloud, Mysore, Jan 2010
Outline • Why accountability? • A definition • A practical implementation: PeerReview • Accountability in the Cloud • Technical Challenges • Conclusion Building and Programming the Cloud, Mysore, Jan 2010
Ideal accountability Whenever a node is faulty, the system generates a proof of misbehavior against that node • Fault := Node deviates from expected behavior • Our goal is to automatically • detect faults • identify the faulty nodes • convince others that a node is (or is not) faulty • Can we build a system that provides the following guarantee? Building and Programming the Cloud, Mysore, Jan 2010
0 X Can we detect all faults? 100101011000101101011100100100 • Problem: Faults that affect only a node's internal state • Would require online trusted probes at each node • Focus on observable faults: • Faults that affect a correct node • Can detect observable faults without requiring trusted components A C Building and Programming the Cloud, Mysore, Jan 2010
Can we always get a proof? I sent X! A • Problem: He-said-she-said • Threepossiblecauses: • A neversent X • B refusestoacknowledge X • X was delayedbythenetwork • Cannotgetproofofmisbehavior! • Generalizetoverifiableevidence: • a proofofmisbehavior, or • a challengethat a faultynodecannotanswer • Whatifthechallengednodedoes not respond? • Does not prove a fault, but nodeissuspecteduntilitresponds X ? B I neverreceived X! ?! C Building and Programming the Cloud, Mysore, Jan 2010
Practical accountability • Requirementfor an accountabledistributedsystem: • Thisisuseful • Any (!) fault thataffects a correctnodeiseventuallydetectedandlinkedto a faultynode • Itcanbeimplemented in practice Whenever a fault isobservedby a correctnode, thesystemeventuallygeneratesverifiableevidenceagainst a faultynode Building and Programming the Cloud, Mysore, Jan 2010
Outline • Why accountability? • A definition • A practical implementation: PeerReview • Accountability in the Cloud • Technical Challenges • Conclusion Building and Programming the Cloud, Mysore, Jan 2010
PeerReview Addsaccountabilityto a givensystem • Implementedas a library • Providestamper-evident record • Detectsfaults via state-machinereplay Assumptions: • Nodes canbemodeledasdeterministicstatemachines • Thereis a trustedreferenceimplementationofthestatemachines • Correctnodescaneventuallycommunicate • Nodes cansignmessages Building and Programming the Cloud, Mysore, Jan 2010
PeerReview is widely applicable • App #1: NFS server in the Linux kernel • Many small, latency-sensitive requests • Tampering with files • Lost updates • App #2: Overlay multicast • Transfers large volume of data • Freeloading • Tampering with content • App #3: P2P email • Complex, large, decentralized • Denial of service • Attacks on DHT routing • Details in [Haeberlen et al., SOSP’07] • NetReview [Haeberlen et al. NSDI’08] • Metadata corruption • Incorrect access control • Censorship Building and Programming the Cloud, Mysore, Jan 2010
How much does PeerReview cost? • Log storage • 10 – 100 GByte per month, depending on application • Message signatures • Message latency (e.g. 1.5ms RTT with RSA-1024) • CPU overhead (embarrassingly parallel) • Log/authenticator transfer, replay overhead • Depends on # witnesses • Can be deferred to exploit bursty/diurnal load patterns Building and Programming the Cloud, Mysore, Jan 2010
Outline • Why accountability? • A definition • A practical implementation: PeerReview • Accountability in the Cloud • Technical Challenges • Conclusion Building and Programming the Cloud, Mysore, Jan 2010
Split administration in theCloud • Bug in Alice‘ssoftware • Subtledifferencesbetween Alice andBob‘senvironments • ... Alice Alice's customers Bob • Whatifthereis a problem? • Bug in Bob‘ssoftware • Insufficientresourceallocation • Hacker attack • ... Building and Programming the Cloud, Mysore, Jan 2010
Split administraction: Alice‘sperspective ? ? ? ? ? ? ? ? Alice Alice's customers Bob • If something is wrong, how will I know? • How can I tell if it's my software or the cloud? • If it's the cloud, how can I convince Bob? Building and Programming the Cloud, Mysore, Jan 2010
Split administraction: Bob'sperspective ? ? ? ? ? ? ? ? ? ? ? ? ? Alice Alice's customers Bob • If something is wrong, how will I know? • How can I tell if it's the cloud or Alice's software? • If it's Alice's software, how can I convince Alice? • If something is wrong, how will I know? • How can I tell if it's my software or the cloud? • If it's the cloud, how can I convince Bob? Building and Programming the Cloud, Mysore, Jan 2010
An idealized solution • Whatifwehad an oraclethat Alice and Bob couldaskaboutproblems? • Completeness:Ifthecloudisfaulty, theoracle will say so • Accuracy:Ifthecloudisnotfaulty, theoracle will say so • Verifiability: The oracleproducesevidencethatwouldconvince a thirdparty Alice Alice's customers Bob Oracle Building and Programming the Cloud, Mysore, Jan 2010
The accountablecloud • Idea: Makecloudaccountable • Cloudrecordsitsactions in a tamper-evident log • Alice canauditthe log and check forfaults • Use log toconstructevidencethat a fault does (not) exist • Shouldworkevenifoneparty was compromised! Alice Alice's customers Tamper-evidentlog Bob Building and Programming the Cloud, Mysore, Jan 2010
Discussion • Is thistoopessimistic? Cloudisn'tmalicious! • Hacker attacks, softwarebugs, operatorerror, maliciousclient, … • Difficulttocomeupwith a morerestrictive fault model • Withoutprovableproperties, evidencehaslittlevalue • Whywould a providerwanttodeploythis? • Attractivetoprospectivecustomers (peaceofmind) • Helps in handlingcustomercomplaints, resolvedisputes Building and Programming the Cloud, Mysore, Jan 2010
Outline • Why accountability? • A definition • A practical implementation: PeerReview • Accountability in the Cloud • Technical Challenges • Conclusion Building and Programming the Cloud, Mysore, Jan 2010
Is the technology ready? • Cloudaccountabilityshould • Haveprovableguarantees • Work formostcloudapplications • Requirenochangestoapplicationcode • Cover a widespectrumofproperties • Havereasonableoverhead • Can existingtechniquesdeliverthis? • CATS, Repeat&Compare, AIP, PeerReview, NetReview, AudIt, ... • More workisneeded! ? ? ? Building and Programming the Cloud, Mysore, Jan 2010
Work in progress: AVM Virtual machine • Goal: Provide accountability for arbitrary binary executables • Idea: Accountable virtual machine (AVM) • Cloud records enough data to enable deterministic replay • Alice can replay log against a reference implementation • Can audit any part of the hostedexecution Alice Bob Building and Programming the Cloud, Mysore, Jan 2010
Challenges • Complete state-machine replay expensive • limit to spot checks, investigation of suspected faults • multi-core replay is hard • replay log against an abstract model? • Checking performance properties • Checking information flow • Lots of research opportunities Building and Programming the Cloud, Mysore, Jan 2010
Summary • Accountability is a useful capability in distributed systems • tamper-evident record • fault detection and localization • evidence • Proposal: the accountable cloud • Can verify correct operation, produce evidence • Provable guarantees solid foundation for both players • Challenges remain Questions? Building and Programming the Cloud, Mysore, Jan 2010