220 likes | 337 Views
Coterie availability in sites. Flavio Junqueira and Keith Marzullo University of California, San Diego DISC, Krakow, Poland, September 2005. Multi-site systems. Emerging class of distributed systems Collection of sites across a WAN Multiple nodes in each site Share resources Data sets
E N D
Coterie availability in sites Flavio Junqueira andKeith Marzullo University of California, San Diego DISC, Krakow, Poland, September 2005
Multi-site systems • Emerging class of distributed systems • Collection of sites across a WAN • Multiple nodes in each site • Share resources • Data sets • Computational power • E.g. BIRN, Geon, TeraGrid, PlanetLab • Site failure • All the nodes in a site simultaneously unavailable
Site availability — BIRN 10 sites experience at least one outage One site under 97%
Improving availability • Better availability through replication • Coteries • Set system of processes: a set of subsets of processes • Each subset is called a quorum • Minimal sets, pairwise intersect • Coteries are useful • Distributed mutual exclusion • Distributed registers • Consensus through Paxos • Coterie availability in multi-site systems
Roadmap • System model • Availability metrics • Previous deterministic metrics not necessarily good • A new metric • Failure model • Characterize failures using survivor sets • Survivor sets: more expressive • Quorum construction • Multi-site hierarchical construction • Practical issues • Failure model in practice • PlanetLab experiment • Conclusions
System model • Set Pof processes • Pairwise connected by quasi-reliable asynchronous channels • Process failure: crash • Processes can recover • Set B of sites • Partition of the set processes • Site failure: simultaneous failure of all the processes in the site • Process failures are not independent • Execution • Sequence of steps of processes • E: set of all executions • In a step s • Available process in s • p P is available if p F(s)
Survivor sets • A set S P is a survivor set iff • Example E={E1,E2,E3,E4} Processes Sites E1,E2: s1 E3: s2 E4: s1 s1 NF(si) Survivor sets
Availability metrics • Traditional deterministic metrics • Undirected graph: nodes = processes, edges = comm. links • Node vulnerability: Minimal number of nodes • Edge vulnerability: Minimal number of edges • Majority is optimal [Barbara and Garcia-Molina’86] • Complete graphs
Majority Quorum: 5 processes In some step, no quorum can be formed Using SPas quorums In every step, at least one quorum can be formed A counterexample Sites Processes Survivor sets Majority is not optimal
Availability metrics • Traditional deterministic metrics • Undirected graph: nodes = processes, edges = comm. links • Node vulnerability: Minimal number of nodes • Edge vulnerability: Minimal number of edges • Majority is optimal [Barbara and Garcia-Molina’86] • Complete graphs • A new metric A(Q), Q is a coterie • Number of covered survivor sets in Q • A survivor set S is covered in Q if:
Fp[1]={{ }: i{1,2,3}} i Fp[2]={{ }: i{1,2,3}} i Fp[3]={{ }: i{1,2,3}} i Sp={{}:i, j,k,l{1,2,3} ij kl} i j k l {{}:i, j,k,l{1,2,3} ij kl} i j k l {{}:i, j,k,l{1,2,3} ij kl} i j k l Failure model Processes (P) • Multi-site hierarchical model • A set Fs of subsets of B • Subsets of simultaneously faulty sites • An array Fp • One entry per site • Each entry: subsets of processes in the site • Subsets of simultaneously faulty processes at a site • A survivor set S: FS Fs • Bi FS:FP Fp[i]:P\FP S • Bi FS:Bi S = 1 2 3 1 2 3 1 2 3 B1 B2 B3 Sites(B ) Fs ={{B1},{B2},{B3}}
Quorum construction • Optimal availability with respect to A • Coterie Q : Sp = Q OR Q dominates Sp • Survivor sets in Sp pairwise intersect • If not, then optimally discarding survivor sets is NP-Complete • A special case: Qsite • All subsets of B of size fs inFs • All subsets of size t of Bi in Fp[i], for every i Quorums E.g.:fs = 1, t = 1 Site 1 Site 2 Site 3
Failure transitions Repair transitions Model in practice • Qsite • fs: Threshold on site failures • Data on site availability • t : Threshold on process failures • Markov chains • One Markov chain for each site • Transitions • Failure transitions: same probability, homogeneous processes • Repair transitions: variable probability, amount of resources used
PlanetLab experiment • Toy application • Paxos: quorums of acceptors • Client accessing quorums • Hosts used • Three sites: three from each site • One UCSD host: proposer, learner • Three settings • 3Sites: One acceptor per site • Quorum: two hosts • 3SitesMaj: All hosts • Quorum: four hosts, majority from each of two sites • SimpleMaj: All hosts • Quorum: any five processes UC Davis UC San Diego Duke UT Austin 3SitesMaj has better availability SimpleMaj has worse availability
The Bimodal model • Sites are survivor sets • Sp is not a coterie • “Throw out” survivor sets • In general, optimal solution is NP-Complete • Simple solution for this model • Practical issues • Practical for two sites • More than two sites: open problem
Conclusions • Coteries for multi-site systems • Site failures: process failures not independent • A new metric • Counts covered survivor sets • Multi-site hierarchical construction • Practical • Illustrated with Markov model • Experiment shows better availability • Using majority quorums is not a good idea • Not optimal • Poor performance • Future work • More experiments, more constructions, real deployment
The multi-site hierarchical model A set Fs of subsets of B An array Fp One entry per site Each entry: subsets of processes in the site A survivor set S: FS Fs Bi FS:FP Fp[i]:P\FP S Bi FS:Bi S = The bimodal model A set Fs of subsets of B There is one site that is in no element of Fs An array Fp A survivor set S As in the previous model OR Bi B: S = Bi Failure models Processes 1 2 3 1 2 3 B1 B2 Fs = Fp[1]={{ }: i{1,2,3}} i Fp[2]={{ }: i{1,2,3}} i MSH:Sp={{}:i, j,k,l{1,2,3} ij kl} i k l j B:Sp={{}:i, j,k,l{1,2,3} ij kl} B i j k l
Bimodal construction • Bimodal model • By construction: Not all pairs of survivor sets intersect • Discard survivor sets until remaining intersect • Selecting optimally is NP-Complete • Solution: Remove |B|-1 survivor sets • Survivor sets containing processes from multiple sites pairwise intersect • Construction is also optimal with respect to metric A • A special case: Bsite • All elements of Fs have size fs • All elements of Fp[i] have the same size t, for every i E.g.:fs = 1, t = 1 B1 Quorums B2
Site availability • Goals • Show that sites are unavailable frequently enough • BIRN - Biomedical Informatics Research Network • Test bed projects centered around brain imaging • Currently: 19 universities, 26 research groups • Availability • Monthly basis • Pings (BIRN-CC) • Storage broker logs • Site availability • Jan/04-Aug/04 • Availability under 100% • On average in 5 out of the 8 months
Causes of site failures • Misconfigured software • Shared resources • Storage • Power circuits • Cooling pipes • Air conditioning • Network