1 / 28

Enabling Cost-Effective Resource Leases with Virtual Machines

Enabling Cost-Effective Resource Leases with Virtual Machines. HPDC 2007 Hot Topics Session. Borja Sotomayor University of Chicago borja@cs.uchicago.edu. Kate Keahey Argonne National Laboratory/ University of Chicago keahey@mcs.anl.gov. Ian Foster

ziv
Download Presentation

Enabling Cost-Effective Resource Leases with Virtual Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enabling Cost-Effective Resource Leases with Virtual Machines HPDC 2007 Hot Topics Session Borja Sotomayor University of Chicago borja@cs.uchicago.edu Kate Keahey Argonne National Laboratory/University of Chicago keahey@mcs.anl.gov Ian Foster Argonne National Laboratory/University of Chicago keahey@mcs.anl.gov Tim Freeman Argonne National Laboratory/University of Chicago tfreeman@mcs.anl.gov

  2. Motivation Leasing resources for short periods of time can be of great value to many applications. Workflows, real-time applications, and applications requiring resource co-scheduling. Leasing semantics The glidein approach: Condor glideins, MyCluster, and Falkon Advance reservations Meta-scheduling, deadlines, demos Utilization problems We argue that virtualization can make resource leasing cost-effective, despite the overhead of using VMs, thus: Providing an incentive for resource providers to allow short-term leasing of resources. Creating an opportunity for scientific applications (resource consumers) that require multi-level scheduling.

  3. Approach Separate resource provisioning from execution management. Resource provisioning is handled by a new component called the Lease Manager Execution management can continue to be handled by a site's current scheduler (PBS/Maui, SGE, Condor, ...)‏ All provisioning is handled via the use of VMs Including provisioning resources for a batch job Use VM’s suspend/resume mechanisms to backfill and suspend non-interactive/batch applications

  4. Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 Scheduling the lease without using virtualization : SHORT-TERM LEASE Scheduling the lease using virtualization: SHORT-TERM LEASE

  5. Experiment Setting Simulated testbed of 8 nodes connected by 100Mbps network, such that at most two VMs can run simultaneously on one node. We consider the best and worst cases Traces Artificial traces, combining serial batch requests and ARs Would require 10h to run on testbed (assuming perfect utilization)‏ VM runtime overhead assumed to be 10% Experiments

  6. Experiment I • Is using VMs for suspend/resume backfill worth the overhead? • Assumption: we are using only one VM image • Prototype scheduler supporting batch serial requests and advance reservations, using backfilling or suspend/resume to plan around the ARs. • A Resource Management Model for VM-Based Virtual Workspaces, B.Sotomayor, Masters paper, University of Chicago. February 2007.

  7. Best-case trace Trace characteristics Duration of batch requests: Avg=15 min. AR resource consumption: 75% - 100% Proportion of Batch/AR: 75%/25% Benefits from suspend/resume because the large number of relatively long batch requests limit the efficiency of backfilling.

  8. One Image (best case) Baseline Not using VMs (no runtime overhead) and backfilling instead of suspend/resume

  9. One Image (best case) Add Runtime Overhead Running inside a VM adds runtime overhead, but not a big hit since images are predeployed.

  10. One Image (best case) Use Suspend/Resume Allows for better resource utilization than backfilling, even better than baseline (because of long batch requests)‏

  11. Worst-case trace Same as previous trace, but with shorter batch requests (avg=5 minutes)‏ This also entails that there are more batch requests, since the total running time of the trace is still 10h With a large number of relatively short requests, backfilling is already very effective, and little is gained from suspend/resume. Furthermore, many more images have to be deployed in this case, which increases the preparation overhead.

  12. One Image (worst case) Baseline Not using VMs (no runtime overhead) and backfilling instead of suspend/resume

  13. One Image (worst case) Add Runtime Overhead Running inside a VM adds runtime overhead, but not a big hit since images are predeployed.

  14. One Image (worst case) Use Suspend/Resume Doesn't provide any significant advantage over backfilling because of short batch requests.

  15. Experiment II‏ How much do we pay for the added flexibility of operating in multiple virtualized environments? Assumption: we are using multiple images Scheduler also has application-specific knowledge (i.e., it knows it is scheduling VMs) so it is able to also schedule timely VM image transfer. Image reuse strategies: realistically not all images will be different Modification of Experiment I Use 37 possible 600MB VM images. 7 images account for 70% of requests.

  16. Multiple Images (best case) Baseline Not using VMs (no runtime overhead) and backfilling instead of suspend/resume

  17. Multiple Images (best case) Transferring images Adds deployment overhead which delays starting time of batch requests.

  18. Multiple Images (best case) Adding Runtime Overhead Makes running time even larger

  19. Multiple Images (best case) Use Suspend/Resume Better resource utilization compensates for deployment overhead.

  20. Multiple Images (best case) Image Reuse Improves performance slightly.

  21. Multiple Images (worst case) Baseline Not using VMs (no runtime overhead) and backfilling instead of suspend/resume

  22. Multiple Images (worst case) Transferring images Adds deployment overhead which delays starting time of batch requests.

  23. Multiple Images (worst case) Adding Runtime Overhead Relatively small performance hit (the least of our concerns here)‏

  24. Multiple Images (worst case) Use Suspend/Resume Doesn't improve significantly over backfilling, which already does a good job thanks to the presence of small batch requests

  25. Multiple Images (worst case) Image Reuse Compensates for deployment overhead. Still not as good as baseline, but relatively small difference

  26. Conclusions Using virtualization can make short-term leasing with interesting semantics cost-effective even in the presence of runtime overhead Given reasonable strategies of deployment overhead management the cost of using multiple images is acceptable. However, only artificial stress traces have been used so far. Preliminary results with real traces suggest that short-term leases can be integrated into real workloads and still be cost-effective (we will release these results as soon as they're solid)‏

  27. Ongoing Work • Develop a better scheduler • Handle parallel batch submissions • Integrate this virtualized resource manager with existing LRM • This work is our top-down effort • We also have a bottom-up effort • Better modeling of traces • Based on real world batch submissions • Non-uniform overhead • Understanding VM overhead in practice • Virtualization in Practice: http://press.mcs.anl.gov/virtualization/

  28. Questions? Borja Sotomayor University of Chicago borja@cs.uchicago.edu Kate Keahey Argonne National Laboratory/University of Chicago keahey@mcs.anl.gov Ian Foster Argonne National Laboratory/University of Chicago keahey@mcs.anl.gov Tim Freeman Argonne National Laboratory/University of Chicago tfreeman@mcs.anl.gov

More Related