260 likes | 396 Views
Scalable Computing on Open Distributed Systems. Jon Weissman University of Minnesota National E-Science Center CLADE 2008. What is the Problem?. Open distributed systems Tasks submitted to the “system” for execution Workers do the computing, execute a task, return an answer The Challenge
E N D
Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008
What is the Problem? • Open distributed systems • Tasks submitted to the “system” for execution • Workers do the computing, execute a task, return an answer • The Challenge • Computations that are erroneous or late are less useful • Failure, errors, hacked, misconfigured • Unpredictable time to return answers • Both local- and wide-area systems • Focus on volunteer wide-area systems
Shape of the Solution • Replication • Works for all sources of unreliability • computation and data • How to do this intelligently - scalably?
Replication Challenges • How many replicas? • too many – waste of resources • too few – application suffers • Most approaches assume ad-hoc replication • under-replicate: task re-execution (^ latency) • over-replicate: wasted resources (v throughput) • Using information about the pastbehavior of a node, we can intelligently size the amount of redundancy
Problems with ad-hoc replication Unreliable node Task x sent to group A Reliable node Task y sent to group B
System Model • Reputation rating ri– degree of node reliability • Dynamically size the redundancy based on ri • Note: variable sized groups • Assume no correlated errors, relax later 0.9 0.8 0.8 0.7 0.7 0.4 0.3 0.4 0.8 0.8
Smart Replication • Rating based on past interaction with clients • prob. (ri) over window t • correct/total or timely/total • extend to worker group (assuming no collusion) => likelihood of correctness (LOC) • Smarter Redundancy • variable-sized worker groups • intuition: higher reliability clients => smaller groups
Terms • LOC (Likelihood of Correctness), lg • computes the ‘actual’ probability of getting a correct or timely answer from a group g of clients • Target LOC (ltarget) • the success-rate that the system tries to ensure while forming client groups
Scheduling Metrics • Guiding metrics • throughput r: is the set of successfully completed tasks in an interval • success rate s: ratio of throughput to number of tasks attempted
Algorithm Space • How many replicas? • algorithms compute how many replicas to meet a success threshold • How to reach consensus? • Majority (better for byzantine threats) • M-1 (better for timeliness) • M-2 (2 matching)
Evaluation • Baselines • Fixed algorithm: statically sized equal groups uses no reliability information • Random algorithm: forms groups by randomly assigning nodes until ltarget is reached • Simulated a wide-variety of node reliability distributions
Experimental Results: correctness Simulation: byzantine behavior only … majority voting
Role of ltarget • Key parameter • hard to specify • Too large • groups will be too large (low throughput) • Too small • groups will be too small (low success rate) • Instead, adaptively learn it • bias toward r or s or both
What about time? • Timeliness • Result > time T is less (or not) useful • (1) soft deadlines • user interacting, visualization output from computation • (2) hard deadlines • need to get X results done before HPDC/NSDI/… deadline • Live experimentation on PlanetLab • Real application: BLAST
Some PL data Computation - both across and within nodes Temporal variability Communication - both across and within nodes
PL Environment Ridge is our live system that implements reputation 120 wide-area nodes, fully correct,M-1 consensus 3 Timeliness environments based on deadlines D=120s D=180s D=240s
Experimental Results: timeliness Best BOINC (BOINC*), conservative (BOINC-) vs. RIDGE
Collusion • Suppose errors are correlated? • How? • Widespread bug (hardware or software) • Misconfiguration • Virus • Sybil attack • Malicious group • With Emmanuel Jeannot (Inria)
Key Ideas • Execute a task => answer groups • A1, A2, … Ak • For each Ai there are associated workers Wi1, Wi2…Win • Pcollusion(workers in Ai) • Learn probability of correlated errors • Pcollusion(W1, W2) • Estimate probability of group correlated errors • Pcollusion(G), G=[W1, W2, W3, …] via f {Pcollusion(Wi, Wj), for all i,j} • Rank and select answer • Pcollusion(G) and |G| • Update matrix: Pcollusion(W1, W2)
Bootstrap Problem • Building collusion matrix • Must first “bait” colluders • Over-replicate such that majority group is still correct to expose colluders • a : probability of worker collusion • e : probability colluders fool the system • Given a, e => group size k
correctness 4: 1 group 30% colluders, always collude 5. Same group – colludes 30% of the time 7. 2 groups (40%, 30% colluders)
Summary • Reliable Scalable computing • correctness and timeliness • Future work • combined models and metrics • workflows: coupling data and computation reliability Visit ridge.cs.umn.edu to learn more