380 likes | 495 Views
Byzantine Fault Isolation in the Farsite Distributed File System. John R. Douceur and Jon Howell. Byzantine fault 'biz- ə n- t ē n fo lt n (1982) : a failure of a system component that produces arbitrary behavior. ˙. '. '.
E N D
Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell
Byzantine fault \'biz-ən- tēn folt\ n(1982) : a failure of a system component that produces arbitrary behavior ˙ ' ' Byzantine fault isolation \'biz-ən- tēn folt ī-sə-'lā- shən\ n(2006) : methodology for designing a distributed system that can, under Byzantine failure, operate with application-defined partial correctness ˙ ' ' ' BFI \ bē-ef-'ī\ n(2006) : Byzantine fault isolation ' Definitions Farsite \'fär-sīt\ n(2000) : serverless distributed file system developed at Microsoft Research, designed to be scalable, strongly consistent, and secure despite running on an untrusted infrastructure of desktop PCs
Talk Outline • Context – Farsite system • Why BFT doesn’t scale • Farsite’s use of multiple BFT groups • The need for isolating Byzantine faults • Formal system specification • BFI in Farsite
Farsite System client server server client server
– Metadata Farsite System metadata users clients BFT group
– Metadata Farsite System T = tolerable faults R = count of replicas R > 3 T • Using Byzantineagreement protocol,assign sequencenumbers to messages • Prepare-commitamong 2 T + 1 servers • Deterministicallyupdate metadata • Reply to client users clients BFT group
The Cost of BFT Groups 1 4 computation message delays 5 2 messages 2 32
Throughput vs. Scale 7 6 5 4 throughput multiple 3 2 1 0 1 2 3 4 5 6 7 machine count ideal typical flat BFT
Workload Sharing Workload client server
Tree of BFT Groups / public users emacs cruft Alice Bob Outlook vi code docs C++ C# Proj X foo bar src bin src bin
Delegation to New Group / public users emacs cruft Alice Bob Outlook vi code docs C++ C# Proj X foo bar src bin src bin
/ public users emacs cruft Alice Bob Outlook vi code docs C++ C# Proj X foo bar src bin src bin Pathname Resolution /users/Alice/code/C#/bar
Quantitative Fault Analysis • Example system • File system distributed among interacting BFT groups • Simplifying assumptions • Files are partitioned evenly among BFT groups • Machine failures are independent • Machine fault probability = 0.001 • Evaluate: operational fault rate • Probability that an operation on a randomly selected file exhibits a fault
0.45 –1 –3 –4 –5 –2 –7 –6 –6 –5 –6 0 610 310 610 10 10 10 10 10 10 10 10 Operational Faults vs. System Scale operational fault rate 1 10 100 1,000 10,000 100,000 system scale (count of BFT groups) BFT 4, no BFI BFT 7, no BFI BFT 10, no BFI BFT 4, ideal BFI BFT 4, tree (4) BFI BFT 4, tree (16) BFI
BFI versus no BFI 4-member BFT groups with BFI 10-member BFT groups without BFI 4 10 computation messages 200 32 throughput reduction: 60% 84%
refinement ment NEW Improved! BFI via Formal Specification state state actions actions + faults + faults distributedsystemspec semanticspec
Farsite Semantic Spec / tools code C++ emacs src bin cl.exe a.h a.cpp a.obj a.exe read open move open handles pending operations
/ tools code C++ emacs src bin cl.exe a.h a.cpp a.obj a.exe read move open handles pending operations Farsite Refinement del
Actions are State Transitions / a.cpp openhandles pending operations
Proving Refinement Inductively / a.cpp openhandles pending operations
/ tools code C++ emacs src bin cl.exe a.h a.cpp a.obj a.exe read del move open handles pending operations Refinement with Byzantine Faults
Refinement with Byzantine Faults / tools code C++ emacs src bin cl.exe a.h a.cpp a.obj a.exe read del move open handles pending operations
Semantic Fault Specification A tainted file may have arbitrary contents and attributes • Safety • A tainted file may have arbitrary contents and attributes • A tainted file may appear not linked into namespace • A tainted file may pretend not to have children it actually has • A tainted file may pretend to have children that do not exist • A tainted file may pretend another tainted file is a child or parent • Liveness • Operations involving a tainted file may not complete A tainted file may appear not linked into namespace A tainted file may pretend not to have children it actually has A tainted file may pretend to have children that do not exist A tainted file may pretend another tainted file is a child or parent Operations involving a tainted file may not complete / Hello world ,,)*&#()*&{ 1[9^^x **{ o [[ …. 2 %%% @@) ,. ,. {^ \-~-/ ^} " " ,". { <o> _ <o> } / } ==_ .:Y:. _=={ { _/ `--^--' \_} } / \ / \ / { ( ) y \ ! | | ! / ,-.i~ ~i i~ ~i,-. (!!( V )!!) ^-'-'-^-'-'-^ tools code C++ emacs src bin foo bar cl.exe a.h a.cpp a.obj a.exe
Distributed-System Improvements Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness • Maintain redundant info across BFT group boundaries • Augment messages with info that justifies correctness • Ensure unambiguous chains of authority over data • Carefully order messages and state updates for operations involving multiple BFT groups Ensure unambiguous chains of authority over data Carefully order messages and state updates foroperations involving multiple BFT groups
Summary of BFI Methodology • Formally specify your system • Semantic spec: user’s view of system • Distributed-system spec: designer’s view of system • Refinement interprets distributed-system spec in semantic terms • Modify distributed-system spec to express Byzantine faults • Simultaneously • Strategically weaken semantic spec to describe faults • Improve distributed-system spec to quarantine faults • Refinement lets you know when you are done
Conclusions • BFT groups have negative throughput scaling • Scalable systems can be built from multiple BFT groups • System scale increases the probability of non-maskable Byzantine faults • If faults are not isolated, a single faulty group can corrupt the entire system. • BFI is a methodology for isolating Byzantine faults • BFI uses formal system specification • Improves fault tolerance without hurting throughput, unlike increasing BFT group size
Contact Information JohnDo@microsoft.com Howell@microsoft.com http://research.microsoft.com/farsite
Farsite Spec Stats • Semantic specification • 1800 lines of TLA+ • 114 definitions • Distributed-system specification • 11,500 lines of TLA+ • 775 definitions • Why so big? • Windows file-system semantics are complex • Scalability and strong consistency • Byzantine fault isolation