640 likes | 785 Views
A Statistical Scheduling Technique for a Computational Market Economy. Neal Sample Stanford University. Research Interests. Compositional Computing (GRID) Reliability and Quality of Service Value-based and model-based mediation Languages: “Programming for the non-programmer expert”
E N D
A Statistical Scheduling Technique for a Computational Market Economy Neal Sample Stanford University
Research Interests • Compositional Computing (GRID) • Reliability and Quality of Service • Value-based and model-based mediation • Languages:“Programming for the non-programmer expert” • Database Research • Semistructured indexing and storage • Massive table/stream compression • Approximate algorithms for streaming data
Why We’re Here Integration/Composition Coding 1970 1990 2010
GRID: Commodity Computing Distributed Supercomputing High Throughput Collaborative On Demand (Chip design, cryptography) (FightAIDSAtHome, Nug30) (Data exploration, Education) Data Intensive (Large Hadron Collider) (Computer-in-the-loop)
Composition of Large Services • Remote, autonomous • Services are not free • Fee ($), execution time • 2nd order dependencies • “Open Service Model” • Principles: GRID, CHAIMS • Protocols: UDDI, IETF SLP • Runtime: Globus, CPAM
Grid Life is Tough • Increased complexity throughout • New tools and applications • Diverse resources such as computers, storage media, networks, sensors • Programming • Control flow & data flow separation • Service mediation • Infrastructure • Resource discovery, brokering, monitoring • Security/authorization • Payment mechanisms
Our GRID Contributions • Programming models and tools • System architecture • Resource management • Instrumentation and performance analysis • Network protocols and infrastructure • Service mediation
Other GRID Research Areas • The nature of applications • Algorithms and problem solving methods • Security, payment/escrow, reputation • End systems • Programming models and tools • System architecture • Resource management • Instrumentation and performance analysis • Network protocols and infrastructure • Service mediation
Roadmap • Brief introduction to CLAM language • Some related scheduling methods • Surety-based scheduling • Sample program • Monitoring • Rescheduling • Results • A few future directions
CLAM Composition Language • Decomposition of CALL-statement • Parallelism by asynchrony in sequential program • Reduction of complexity of invoke statements • Control of new GRID requirements(estimation, trading, brokering, etc.) • Abstract out data flow • Mediation for data flow control and optimization • Extraction model mediation • Purely compositional • No primitives for arithmetic • No primitives for input/output • Targets the “non-programmer expert”
CLAM Primitives Pre-invocation: SETUP: set up the connection to a service SET-, GETPARAM: in a service ESTIMATE: service cost estimation Invocation and result gathering: INVOKE EXAMINE: test progress of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation/connection to a service
Resources + Scheduling • Computational Model • Multithreading • Automatic parallelization • Resource Management • Process creation • OS signal delivery • OS scheduling end system
Resources + Scheduling • Computational Model • Synchronous communication • Distributed shared memory • Resource Management • Parallel process creation • Gang scheduling • OS-level signal propagation cluster end system
Resources + Scheduling • Computational Model • Client/server • Loosely synchronous: pipelines • IWIM • Resource Management • Resource discovery • Signal distribution networks intranet cluster end system
Resources + Scheduling • Computational Model • Collaborative systems • Remote control • Data mining • Resource Management • Brokers • Trading • Mobile code negotiation Internet intranet cluster end system
Scheduling Difficulties • Adaptation: Repair and Reschedule • Schedules for T0 are only guesses • Estimates for multiple stages may become invalid • => Schedules must be revised during runtime schedule hazard reschedule work work work t0 TIME tfinish
Scheduling Difficulties • Service Autonomy: No Resource Allocation • The scheduler does not handle resource allocation • Users observe resources without control • Means: Competing objectives have orthogonal scheduling techniques • Changing goals for tasks or users means vastly increased scheduling complexity
Some Related Work R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution
Some Related Work A M Q PERT R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution
Some Related Work A R M A Q M PERT CPM R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution
Some Related Work R A R M M A Q Q M ePERT(AT&T)Condor (Wisconsin) PERT CPM R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution
Some Related Work R A R R M M A A Q Q M Q ePERT(AT&T)Condor (Wisconsin) PERT CPM Mariposa(UCB) R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution
Some Related Work R R A R R A M M A A M Q Q M Q Q ePERT(AT&T)Condor (Wisconsin) PERT CPM Mariposa(UCB) SBS(Stanford) R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution
Sample Program A B C D
Budgeting • Time • Maximum allowable execution time • Expense • Funding available to lease services • Surety • Goal: schedule probability of success • Assessment technique
Program Schedule as a Template • Instantiated at runtime • Service provider selection, etc. A A A A B A B B B B B D D D C C C D C C D C
Program Schedule as a Template • Instantiated at runtime • Service provider selection, etc. A A A A B A B B B B B D D D C C C D C C D C
Program Schedule as a Template • Instantiated at runtime • Service provider selection, etc. A A A A B A B B B B B D D D C C C D C C D C
Program Schedule as a Template • Instantiated at runtime • Service provider selection, etc. A A A A B A B B B B B D D D C C C D C C D C
t0 Schedule Selection • Guided by runtime “bids” • Constrained by budget A A A A B A 7±2h $50 B B B B B 6±1h $40 D D D C C C D C C D C 5±2h $30 3±1h $30
t0 Schedule Constraints • Budget • Time: upper bound - e.g. 22h • Cost: upper bound - e.g. $250 • Surety: lower bound - e.g. 90% • {Time, Cost, Surety} = {22, 250, 90} • Steered by user preferences/weights • <Time, Cost, Surety> = <10, 1, 5> • Selection • S1est [20, 150, 90] = (22-20)*10 + (250-150)*1 + (90-90)*5 = 120 • S2est [22, 175, 95] = (22-22)*10 + (250-175)*1 + (95-90)*5 = 100 • S3est [18, 190, 96] = (22-18)*10 + (250-190)*1 + (96-90)*5 = 130
Search Space Plans Pareto budget cost Budget User Pref. budget time Expected Program Cost 0 0 Expected Program Execution Time
Program Evaluation and Review Technique • Service times:most likely(m), optimistic(a) and pessimistic(b) (1) expected duration (service) (2) standard deviation (3) expected duration (program) and (4) test value ; (5) expectation test (6) ~expectation test N(0, 1)
t0 Complete Schedule Properties userspecifiedsurety Bank = $100 deadline Probability Density Probable Program Completion Time
Individual Service Properties probability density 1.2 1.2 1.2 0 0 0 ~finish time 0 10 A 7±2h B 6±1h C 5±2h
t0 Combined Service Properties Surety(90%) Deadline(22h) 1 1.2 1.2 1.2 0 probable finish time 14 23 0 0 0 Current Surety(99.6%) probability density probability density ~finish time 0 10
Tracking Surety probabilitydensity User-specifiedsurety 100 90 surety % 80
Runtime Hazards • With control over resource allocation or without runtime hazards • Scheduling becomes much easier • Runtime implies t0 schedule invalidation • Sample hazards • Delays and slowdowns • Stoppages • Inaccurate estimations • Communication loss • Competitive displacement… OSM
Progressive Hazard Definition + Detection 100 serviceAstart serviceBstart minimumsurety surety % 90 hazard (serviceB slow) 80 0 execution time
Catastrophic Hazard Definition + Detection 100 serviceAstart serviceBstart minimumsurety surety % 90 0% hazard (serviceB fails) 80 0 execution time
Pseudo-Hazard Definition + Detection 100 serviceAstart serviceBstart minimumsurety surety % 90 pseudo-hazard 0% (serviceB communication failure) 80 0 execution time
Monitoring + Repair • Observe, not control • Complete set of repairs • Sufficient (not minimal) • Simple cost model: early termination = linear cost recovery • Greedy selection of single repair -O(s*r) A B C D
Schedule Repair 100 A 90 surety % B thazard trepair 80 C D execution time 0
Strategy 0: baseline (no repair) • pro: no additional $ cost • pro: ideal solution for partitioning hazards • con: depends on self-recovery 100 A 90 surety % B thazard trepair 80 C D execution time 0
Strategy 1: service replacement • pro: reduces $ lost • con: lost investment of $ and time • con: concedes recovery chance 100 A 90 surety % B’ B thazard trepair 80 C D execution time 0
Strategy 2: service duplication • pro: larger boost surety; leverages recovery chance • con: large $ cost 100 A 90 surety % B’ B thazard trepair 80 C D execution time 0
Strategy 3: pushdown repair • pro: cheap, no $ lost • pro: no time lost • con: cannot handle catastrophic hazards • con: requires recovery chance 100 A 90 surety % B x thazard trepair 80 C D execution time 0 C’
Experimental Results • Rescheduling options • Baseline: no repairs • Single strategy repairs • Limits flexibility and effectiveness • Use all strategies • Setup • 1000 random DAG schedules, 2-10 services • 1-3 hazards per execution • Fixed service availability • All schedules are repairable