1 / 26

Scalable Computing on Open Distributed Systems

Scalable Computing on Open Distributed Systems. Jon Weissman University of Minnesota National E-Science Center CLADE 2008. What is the Problem?. Open distributed systems Tasks submitted to the “system” for execution Workers do the computing, execute a task, return an answer The Challenge

tamarr
Download Presentation

Scalable Computing on Open Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008

  2. What is the Problem? • Open distributed systems • Tasks submitted to the “system” for execution • Workers do the computing, execute a task, return an answer • The Challenge • Computations that are erroneous or late are less useful • Failure, errors, hacked, misconfigured • Unpredictable time to return answers • Both local- and wide-area systems • Focus on volunteer wide-area systems

  3. Shape of the Solution • Replication • Works for all sources of unreliability • computation and data • How to do this intelligently - scalably?

  4. Replication Challenges • How many replicas? • too many – waste of resources • too few – application suffers • Most approaches assume ad-hoc replication • under-replicate: task re-execution (^ latency) • over-replicate: wasted resources (v throughput) • Using information about the pastbehavior of a node, we can intelligently size the amount of redundancy

  5. Problems with ad-hoc replication Unreliable node Task x sent to group A Reliable node Task y sent to group B

  6. System Model • Reputation rating ri– degree of node reliability • Dynamically size the redundancy based on ri • Note: variable sized groups • Assume no correlated errors, relax later 0.9 0.8 0.8 0.7 0.7 0.4 0.3 0.4 0.8 0.8

  7. Smart Replication • Rating based on past interaction with clients • prob. (ri) over window t • correct/total or timely/total • extend to worker group (assuming no collusion) => likelihood of correctness (LOC) • Smarter Redundancy • variable-sized worker groups • intuition: higher reliability clients => smaller groups

  8. Terms • LOC (Likelihood of Correctness), lg • computes the ‘actual’ probability of getting a correct or timely answer from a group g of clients • Target LOC (ltarget) • the success-rate that the system tries to ensure while forming client groups

  9. Scheduling Metrics • Guiding metrics • throughput r: is the set of successfully completed tasks in an interval • success rate s: ratio of throughput to number of tasks attempted

  10. Algorithm Space • How many replicas? • algorithms compute how many replicas to meet a success threshold • How to reach consensus? • Majority (better for byzantine threats) • M-1 (better for timeliness) • M-2 (2 matching)

  11. One Scheduling Algorithm

  12. Evaluation • Baselines • Fixed algorithm: statically sized equal groups uses no reliability information • Random algorithm: forms groups by randomly assigning nodes until ltarget is reached • Simulated a wide-variety of node reliability distributions

  13. Experimental Results: correctness Simulation: byzantine behavior only … majority voting

  14. Role of ltarget • Key parameter • hard to specify • Too large • groups will be too large (low throughput) • Too small • groups will be too small (low success rate) • Instead, adaptively learn it • bias toward r or s or both

  15. Adaptive Algorithm

  16. What about time? • Timeliness • Result > time T is less (or not) useful • (1) soft deadlines • user interacting, visualization output from computation • (2) hard deadlines • need to get X results done before HPDC/NSDI/… deadline • Live experimentation on PlanetLab • Real application: BLAST

  17. Some PL data Computation - both across and within nodes Temporal variability Communication - both across and within nodes

  18. PL Environment Ridge is our live system that implements reputation 120 wide-area nodes, fully correct,M-1 consensus 3 Timeliness environments based on deadlines D=120s D=180s D=240s

  19. Experimental Results: timeliness Best BOINC (BOINC*), conservative (BOINC-) vs. RIDGE

  20. Makespan Comparison

  21. Collusion • Suppose errors are correlated? • How? • Widespread bug (hardware or software) • Misconfiguration • Virus • Sybil attack • Malicious group • With Emmanuel Jeannot (Inria)

  22. Key Ideas • Execute a task => answer groups • A1, A2, … Ak • For each Ai there are associated workers Wi1, Wi2…Win • Pcollusion(workers in Ai) • Learn probability of correlated errors • Pcollusion(W1, W2) • Estimate probability of group correlated errors • Pcollusion(G), G=[W1, W2, W3, …] via f {Pcollusion(Wi, Wj), for all i,j} • Rank and select answer • Pcollusion(G) and |G| • Update matrix: Pcollusion(W1, W2)

  23. Bootstrap Problem • Building collusion matrix • Must first “bait” colluders • Over-replicate such that majority group is still correct to expose colluders • a : probability of worker collusion • e : probability colluders fool the system • Given a, e => group size k

  24. correctness 4: 1 group 30% colluders, always collude 5. Same group – colludes 30% of the time 7. 2 groups (40%, 30% colluders)

  25. throughput

  26. Summary • Reliable Scalable computing • correctness and timeliness • Future work • combined models and metrics • workflows: coupling data and computation reliability Visit ridge.cs.umn.edu to learn more

More Related