1 / 14

Predicting Queue Waiting Time in Batch Controlled Systems

Predicting Queue Waiting Time in Batch Controlled Systems. Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University of California, Santa Barbara. Problem: Predicting Delay in Batch Queues. Time in queue is experienced as application delay

kineta
Download Presentation

Predicting Queue Waiting Time in Batch Controlled Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Queue Waiting Time inBatch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University of California, Santa Barbara

  2. Problem: Predicting Delay in Batch Queues • Time in queue is experienced as application delay • Sounds like an easy problem, but • Distribution of load from users is a matter of some debate • Scheduling policy is partially hidden • Sites need to change the policies dynamically and without warning • Job execution times are difficult to predict • Much research in this area over the past 20 years, but few solutions • Current commercial systems provide high variance estimates • Most sites simply disable this feature

  3. Hard Problem

  4. For Scheduling: It’s all about the big Q • Predictions of the form • “What is the maximum time my job will wait with X% certainty?” • “What is the minimum time my job will wait with X% certainty?” • Requires two estimates if certainty is to be quantified • Estimate the (1-X) quantile for the distribution of availability => Qx • Estimate the upper or lower X% confidence bound on the statistic Qx=> Q(x,lb) • If the estimates are unbiased, and the distribution is stationary, future availability duration will be larger than Q(x,lb)X% of the time, guaranteed

  5. New Predictive Methodology • New quantile estimator invention based on Binomial distribution • Requires carefully engineered numerical system to deal with large-scale combinatorics • New changepoint detector • Binomial method in a time series context is difficult • Need a system to determining • Stationary regions in the data • Minimum statistically meaningful history in each region • New clustering methodology (coming soon) • More accurate estimates are possible if predictions are made from jobs with similar characteristics • Takes dynamic policy changes into account more effectively

  6. Ten Years of Supercompuuting

  7. See it In Action • http://pompone.cs.ucsb.edu/~rgarver/bqindex.php

  8. Predicting Things Upside Down • Deadline scheduling: My job needs to start in the next X seconds for the results to be meaningful. • Amitava Mujumdar, Tharaka Devaditha, Adam Birnbaum (SDSC) • Need to run a 4 minute image reconstruction that completes in the next 8 minutes • Given a • Machine • Queue • Processor count • Run time • Deadline • What is the probability that a job will meet the deadline? • http://pompone.cs.ucsb.edu/~rgarver/invbqueue.php

  9. How Well Does it Work with an Application? Refine Electron Micrograph Final 3D model Preliminary 3D Model EMAN Preliminary 3D model Particles EMAN has been developed at Baylor College of Medicine by Research group of Wah Chiu and Steven Ludtke {wah,sludtke}@bcm.tmc.edu

  10. VGrADS EMAN Batch Scheduler • EMAN emulator • Run the EMAN scheduler to determine a job launch sequence • Launch the jobs by submitting them to the queues specified by the scheduler • When an EMAN job acquires the processors, exit and “sleep” the emulator for the predicted execution time • Saves system allocation time • Record the overall makespan • Experiment: • Chicago TeraGrid, SDSC TeraGrid, NCSA TeraGrid and CNSI Dell at UCSB • 57 separate runs • Results: mean observed and mean predicted makespans are not significantly different at alpha = 0.05

  11. 95% Upper Bound on Median

  12. Clustering • RMS ratio of Binomial with Clustering to without • Both achieve 95% correctness • Measures “tightness” improvement through clustering

  13. Batch Queue Prediction for Grid Systems • A good point-valued prediction remains elusive • Grid users certainly can use bounds instead • Early job completion is okay, typically • Bounds give a good intuitive feel for which queue will be quickest • Automatic schedulers are coming • EMAN doesn’t use ranges…it should • VGrADS is developing new schedulers (workflow) • NEESGrid and ISI are in development (workflow) • Large-scale sensor network simulation

  14. What’s Next? • Open questions: • Does the availability of predictions affect load? • Rolling out production tools now and we will be monitoring • Job cancellation does not affect results • If it does, will allocations be stable? • Grid economies • Virtual resource reservations (VGrADS) • Conditional prediction and resubmission • Virtual Cluster?? • Thanks • NSF SCI, VGrADS, SDSC, TACC • Us: rich@cs.ucsb.edu, nurmi@cs.ucsb.edu

More Related