140 likes | 268 Views
Predicting Queue Waiting Time in Batch Controlled Systems. Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University of California, Santa Barbara. Problem: Predicting Delay in Batch Queues. Time in queue is experienced as application delay
E N D
Predicting Queue Waiting Time inBatch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University of California, Santa Barbara
Problem: Predicting Delay in Batch Queues • Time in queue is experienced as application delay • Sounds like an easy problem, but • Distribution of load from users is a matter of some debate • Scheduling policy is partially hidden • Sites need to change the policies dynamically and without warning • Job execution times are difficult to predict • Much research in this area over the past 20 years, but few solutions • Current commercial systems provide high variance estimates • Most sites simply disable this feature
For Scheduling: It’s all about the big Q • Predictions of the form • “What is the maximum time my job will wait with X% certainty?” • “What is the minimum time my job will wait with X% certainty?” • Requires two estimates if certainty is to be quantified • Estimate the (1-X) quantile for the distribution of availability => Qx • Estimate the upper or lower X% confidence bound on the statistic Qx=> Q(x,lb) • If the estimates are unbiased, and the distribution is stationary, future availability duration will be larger than Q(x,lb)X% of the time, guaranteed
New Predictive Methodology • New quantile estimator invention based on Binomial distribution • Requires carefully engineered numerical system to deal with large-scale combinatorics • New changepoint detector • Binomial method in a time series context is difficult • Need a system to determining • Stationary regions in the data • Minimum statistically meaningful history in each region • New clustering methodology (coming soon) • More accurate estimates are possible if predictions are made from jobs with similar characteristics • Takes dynamic policy changes into account more effectively
See it In Action • http://pompone.cs.ucsb.edu/~rgarver/bqindex.php
Predicting Things Upside Down • Deadline scheduling: My job needs to start in the next X seconds for the results to be meaningful. • Amitava Mujumdar, Tharaka Devaditha, Adam Birnbaum (SDSC) • Need to run a 4 minute image reconstruction that completes in the next 8 minutes • Given a • Machine • Queue • Processor count • Run time • Deadline • What is the probability that a job will meet the deadline? • http://pompone.cs.ucsb.edu/~rgarver/invbqueue.php
How Well Does it Work with an Application? Refine Electron Micrograph Final 3D model Preliminary 3D Model EMAN Preliminary 3D model Particles EMAN has been developed at Baylor College of Medicine by Research group of Wah Chiu and Steven Ludtke {wah,sludtke}@bcm.tmc.edu
VGrADS EMAN Batch Scheduler • EMAN emulator • Run the EMAN scheduler to determine a job launch sequence • Launch the jobs by submitting them to the queues specified by the scheduler • When an EMAN job acquires the processors, exit and “sleep” the emulator for the predicted execution time • Saves system allocation time • Record the overall makespan • Experiment: • Chicago TeraGrid, SDSC TeraGrid, NCSA TeraGrid and CNSI Dell at UCSB • 57 separate runs • Results: mean observed and mean predicted makespans are not significantly different at alpha = 0.05
Clustering • RMS ratio of Binomial with Clustering to without • Both achieve 95% correctness • Measures “tightness” improvement through clustering
Batch Queue Prediction for Grid Systems • A good point-valued prediction remains elusive • Grid users certainly can use bounds instead • Early job completion is okay, typically • Bounds give a good intuitive feel for which queue will be quickest • Automatic schedulers are coming • EMAN doesn’t use ranges…it should • VGrADS is developing new schedulers (workflow) • NEESGrid and ISI are in development (workflow) • Large-scale sensor network simulation
What’s Next? • Open questions: • Does the availability of predictions affect load? • Rolling out production tools now and we will be monitoring • Job cancellation does not affect results • If it does, will allocations be stable? • Grid economies • Virtual resource reservations (VGrADS) • Conditional prediction and resubmission • Virtual Cluster?? • Thanks • NSF SCI, VGrADS, SDSC, TACC • Us: rich@cs.ucsb.edu, nurmi@cs.ucsb.edu