1 / 22

Coterie availability in sites

Coterie availability in sites. Flavio Junqueira and Keith Marzullo University of California, San Diego DISC, Krakow, Poland, September 2005. Multi-site systems. Emerging class of distributed systems Collection of sites across a WAN Multiple nodes in each site Share resources Data sets

felix-combs
Download Presentation

Coterie availability in sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coterie availability in sites Flavio Junqueira andKeith Marzullo University of California, San Diego DISC, Krakow, Poland, September 2005

  2. Multi-site systems • Emerging class of distributed systems • Collection of sites across a WAN • Multiple nodes in each site • Share resources • Data sets • Computational power • E.g. BIRN, Geon, TeraGrid, PlanetLab • Site failure • All the nodes in a site simultaneously unavailable

  3. Site availability — BIRN 10 sites experience at least one outage One site under 97%

  4. Improving availability • Better availability through replication • Coteries • Set system of processes: a set of subsets of processes • Each subset is called a quorum • Minimal sets, pairwise intersect • Coteries are useful • Distributed mutual exclusion • Distributed registers • Consensus through Paxos • Coterie availability in multi-site systems

  5. Roadmap • System model • Availability metrics • Previous deterministic metrics not necessarily good • A new metric • Failure model • Characterize failures using survivor sets • Survivor sets: more expressive • Quorum construction • Multi-site hierarchical construction • Practical issues • Failure model in practice • PlanetLab experiment • Conclusions

  6. System model • Set Pof processes • Pairwise connected by quasi-reliable asynchronous channels • Process failure: crash • Processes can recover • Set B of sites • Partition of the set processes • Site failure: simultaneous failure of all the processes in the site • Process failures are not independent • Execution • Sequence of steps of processes • E: set of all executions • In a step s • Available process in s • p P is available if p F(s)

  7. Survivor sets • A set S P is a survivor set iff • Example E={E1,E2,E3,E4} Processes Sites E1,E2: s1 E3: s2 E4: s1 s1 NF(si) Survivor sets

  8. Availability metrics • Traditional deterministic metrics • Undirected graph: nodes = processes, edges = comm. links • Node vulnerability: Minimal number of nodes • Edge vulnerability: Minimal number of edges • Majority is optimal [Barbara and Garcia-Molina’86] • Complete graphs

  9. Majority Quorum: 5 processes In some step, no quorum can be formed Using SPas quorums In every step, at least one quorum can be formed A counterexample Sites Processes Survivor sets Majority is not optimal

  10. Availability metrics • Traditional deterministic metrics • Undirected graph: nodes = processes, edges = comm. links • Node vulnerability: Minimal number of nodes • Edge vulnerability: Minimal number of edges • Majority is optimal [Barbara and Garcia-Molina’86] • Complete graphs • A new metric A(Q), Q is a coterie • Number of covered survivor sets in Q • A survivor set S is covered in Q if:

  11. Fp[1]={{ }: i{1,2,3}} i Fp[2]={{ }: i{1,2,3}} i Fp[3]={{ }: i{1,2,3}} i Sp={{}:i, j,k,l{1,2,3} ij  kl} i j k l  {{}:i, j,k,l{1,2,3} ij  kl} i j k l  {{}:i, j,k,l{1,2,3} ij  kl} i j k l Failure model Processes (P) • Multi-site hierarchical model • A set Fs of subsets of B • Subsets of simultaneously faulty sites • An array Fp • One entry per site • Each entry: subsets of processes in the site • Subsets of simultaneously faulty processes at a site • A survivor set S: FS Fs • Bi FS:FP  Fp[i]:P\FP S • Bi FS:Bi S =  1 2 3 1 2 3 1 2 3 B1 B2 B3 Sites(B ) Fs ={{B1},{B2},{B3}}

  12. Quorum construction • Optimal availability with respect to A • Coterie Q : Sp = Q OR Q dominates Sp • Survivor sets in Sp pairwise intersect • If not, then optimally discarding survivor sets is NP-Complete • A special case: Qsite • All subsets of B of size fs inFs • All subsets of size t of Bi in Fp[i], for every i Quorums E.g.:fs = 1, t = 1 Site 1 Site 2 Site 3

  13. Failure transitions Repair transitions Model in practice • Qsite • fs: Threshold on site failures • Data on site availability • t : Threshold on process failures • Markov chains • One Markov chain for each site • Transitions • Failure transitions: same probability, homogeneous processes • Repair transitions: variable probability, amount of resources used

  14. PlanetLab experiment • Toy application • Paxos: quorums of acceptors • Client accessing quorums • Hosts used • Three sites: three from each site • One UCSD host: proposer, learner • Three settings • 3Sites: One acceptor per site • Quorum: two hosts • 3SitesMaj: All hosts • Quorum: four hosts, majority from each of two sites • SimpleMaj: All hosts • Quorum: any five processes UC Davis UC San Diego Duke UT Austin 3SitesMaj has better availability SimpleMaj has worse availability

  15. The Bimodal model • Sites are survivor sets • Sp is not a coterie • “Throw out” survivor sets • In general, optimal solution is NP-Complete • Simple solution for this model • Practical issues • Practical for two sites • More than two sites: open problem

  16. Conclusions • Coteries for multi-site systems • Site failures: process failures not independent • A new metric • Counts covered survivor sets • Multi-site hierarchical construction • Practical • Illustrated with Markov model • Experiment shows better availability • Using majority quorums is not a good idea • Not optimal • Poor performance • Future work • More experiments, more constructions, real deployment

  17. END

  18. Backup Slides

  19. The multi-site hierarchical model A set Fs of subsets of B An array Fp One entry per site Each entry: subsets of processes in the site A survivor set S: FS Fs Bi FS:FP  Fp[i]:P\FP S Bi FS:Bi S =  The bimodal model A set Fs of subsets of B There is one site that is in no element of Fs An array Fp A survivor set S As in the previous model OR Bi  B: S = Bi Failure models Processes 1 2 3 1 2 3 B1 B2 Fs = Fp[1]={{ }: i{1,2,3}} i Fp[2]={{ }: i{1,2,3}} i MSH:Sp={{}:i, j,k,l{1,2,3}  ij  kl} i k l j B:Sp={{}:i, j,k,l{1,2,3}  ij  kl}  B i j k l

  20. Bimodal construction • Bimodal model • By construction: Not all pairs of survivor sets intersect • Discard survivor sets until remaining intersect • Selecting optimally is NP-Complete • Solution: Remove |B|-1 survivor sets • Survivor sets containing processes from multiple sites pairwise intersect • Construction is also optimal with respect to metric A • A special case: Bsite • All elements of Fs have size fs • All elements of Fp[i] have the same size t, for every i E.g.:fs = 1, t = 1 B1 Quorums B2

  21. Site availability • Goals • Show that sites are unavailable frequently enough • BIRN - Biomedical Informatics Research Network • Test bed projects centered around brain imaging • Currently: 19 universities, 26 research groups • Availability • Monthly basis • Pings (BIRN-CC) • Storage broker logs • Site availability • Jan/04-Aug/04 • Availability under 100% • On average in 5 out of the 8 months

  22. Causes of site failures • Misconfigured software • Shared resources • Storage • Power circuits • Cooling pipes • Air conditioning • Network

More Related