340 likes | 509 Views
Scheduling Bag Of Tasks Under Budget Constraints. Ana-Maria Oprescu , Thilo Kielmann ( Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion , Spring 2012. Bag Of Tasks. High t hroughput computing jobs No interactive deadline Tasks are independent of each other
E N D
Scheduling Bag Of Tasks Under Budget Constraints Ana-Maria Oprescu, ThiloKielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012
Bag Of Tasks • High throughput computing jobs • No interactive deadline • Tasks are independent of each other • All tasks are ready for execution • Unknown runtimes • Execution Model: • Allocate resources (e.g. machines) • Run each task (once) from the bag on some machine
Assumptions: Bag Of Tasks • Unknown runtime distribution • However, some distribution exists • The total number of jobs is also known • Tasks can be aborted
Using cloud computing to run bag of tasks: Abstractions • There are many Cloud providers. (EC2, Azure, Rackspace, 3Tera) • Many types of machines even in the same provider, for a different price. • CPU count and speed • Memory size • Upper limit on the number of machines assignable from a provider (self imposed) • A machine is charged per ATU (Hour)
Problem description • The Goal • Run all the tasks from a given bag on cloud computers, meeting a limited budget • Minimize the makespan of the whole bag (without exceeding the budget constraint) • Assumption • Running each task on a machine separately (FIFO)
Model Description • The scheduler (BaTS) runs outside of the cloud (for free) • The scheduler gets the Bag Of Tasks • It allocates machines from each cloud • Dispatch jobs to the allocated machines • Receives feedback on tasks completion
BaTS: Budget constrained task scheduler (Outline) • pick a sampling set of tasks of size • Pick initial workers from each machine type • Run a test set on each type of machine (parallel) • Estimate avg Task Execution Time for each type • Construct a configuration based on estimates • Acquire Machines and run tasks • At Regular monitoring intervals go back to 5
picking the sampling set size – confidence interval Error Level Typical Values: 0.10,0.15,0.20,0.25
picking the sampling set size Required sample size (n) Bag Of Tasks Size (N)
BaTS: Budget constrained task scheduler (Outline) • pick a sampling set of tasks of size • Pick initial workers from each machine type • Run a test set on each type of machine (parallel) • Estimate avg Task Execution Time for each type • Construct a configuration based on estimates • Acquire Machines and run tasks • At Regular monitoring intervals go back to 5
Estimating avg Task Execution Time for each machine type • Estimate the runtime of running tasks using the average runtime of tasks with larger execution time • Update a moving average of Task Execution Time (in minutes) for each machine type , during the computation
BaTS: Budget constrained task scheduler (Outline) • pick a sampling set of tasks of size • Pick initial workers from each machine type • Run a test set on each type of machine (parallel) • Estimate avg Task Execution Time for each type • Construct a configuration based on estimates • Acquire Machines and run tasks • At Regular monitoring intervals go back to 5
Construct a configuration based on estimates • We need to decide on the value of , The number of machines from each type • We want to minimize: • While not exceeding the budget : ATU cost for machine of type i
Construct a configuration based on estimates (cont.) • Maximize • Subject to • Using BKP (Bounded Knapsack Problem)
BaTS: Budget constrained task scheduler (Outline) • pick a sampling set of tasks of size • Pick initial workers from each machine type • Run a test set on each type of machine (parallel) • Estimate avg Task Execution Time for each type • Construct a configuration based on estimates • Acquire Machines and run tasks • At Regular monitoring intervals go back to 5
Refining the initial configuration • Continuous monitoring is needed: • The configuration was decided based on estimates of average speeds that might not be accurate • Estimated speed of a machine type () converges during the run • The estimated budget and makespan neglects startup time • The machines ATU start time are different. So, we can’t monitor just before ATU ends
Refining the initial configuration (cont.) • Thus, BaTScontinuously tries to avoid budget violations • Theoretically, It’s easy.As the execution continues, the bag is smaller and the budget is smaller. • The trouble is estimating the size of the bag at a given moment. (some machines will finish their current job before ATU ends)
Refining the initial configuration (cont.) • For every type i, we maintain a list of all machines that participated at some point the computation • For every machine we remember • the number of executed tasks, • The total uptime
Refining the initial configuration (cont.) • Total uptime after executing, • Machine speed • The remaining unused time of the ATU is • The expected future #tasks executed by , • #Tasks to be paid for
Refining the initial configuration (cont.) • The potential number of executed tasks • = is the remaining time from the previous ATU that was not large enough for a whole task.
Refining the initial configuration (cont.) • A budget violation is prevented by checking • If the condition does not hold, Using the remaining budget and tasks, BaTS computes a new slower and cheaper configuration.
BaTS: Budget constrained task scheduler (Outline) • pick a sampling set of tasks of size • Pick initial workers from each machine type • Run a test set on each type of machine (parallel) • Estimate avg Task Execution Time for each type • Construct a configuration based on estimates • Acquire Machines and run tasks • At Regular monitoring intervals go back to 5
BaTS Algorithm • Compute n = sample size • Construct initial config C , acquire machines • While bag has tasks do • Wait for any machine M to ask for work • If M returned result of task T • Update stats for machine M • Update the for M’s type • If sample set tasks for M’s type finished • Update clusters stat for M’s type • If (monitoring interval || first clusters stats ready) • Compute estimates • If (constraint violation || first clusters stats ready) • Call BKP to compute a new config, acquire/release machines • Send M a random Task T’, remove T’ from the bag
Evaluation • Emulating 2 clouds with 32 identical machines each • tasks, sample size • Normal Distribution of tasks length
Evaluation • “Machine speed” in each “cloud” was simulated according to 5 scenarios:
Evaluation In each scenario, comparing RR to BaTS • RR always uses 32+32 machines • BaTS initial configuration is 30+30 machines and • Budget B = the cost of running RR for that scenario • Budget B = the cost of running only on the most “profitable” machine type. (computed offline)
Conclusions • BaTS helps choosing the cloud resources suitable for an application • BaTS helps scheduling within budget while still performing reasonably well
Conclusions • Limitations • The provided tests “cheat” because the number of machines is very small • The “Tail phase” is not handled well (The “faster” machines will be released before the “slow” ones) • Guessing a proper budget • Actual Bags on actual clouds • What about data transfer costs? • Storage constraints? • Other metric – maximize the profitability (or minimize the budget) while not exceeding a given makespan