1 / 30

Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Control resource shares to create fair scheduling in virtualization for server consolidation, with a focus on generic mechanisms and virtual time formalism. Developed in Linux.

karlbowie
Download Presentation

Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tal Ben-Nun Scl. Eng & CS Hebrew University Design and Implementation ofa Generic Resource-SharingVirtual-Time Dispatcher Yoav Etsion CS Dept Barcelona SC Ctr Dror Feitelson Scl. Eng & CS Hebrew University Supported by the Israel Science Foundation, grant no. 28/09

  2. Design and Implementationofa Generic Resource-SharingVirtual-Time Dispatcher • Goal is to control share of resources, not to optimize performance – important in virtualization

  3. Design and Implementationofa Generic Resource-SharingVirtual-Time Dispatcher • Goal is to control share of resources, not to optimize performance – important in virtualization Same module used for diverse resources

  4. Design and Implementationofa Generic Resource-SharingVirtual-Time Dispatcher • Goal is to control share of resources, not to optimize performance – important in virtualization Same module used for diverse resources Mechanism used: dispatch the most deserving client at each instant

  5. Design and Implementationofa Generic Resource-SharingVirtual-TimeDispatcher • Goal is to control share of resources, not to optimize performance – important in virtualization Same module used for diverse resources Mechanism used: dispatch the most deserving client at each instant Selection of deserving client using virtual time formalism

  6. Design and Implementationofa Generic Resource-SharingVirtual-TimeDispatcher • Goal is to control share of resources, not to optimize performance – important in virtualization Same module used for diverse resources Mechanism used: dispatch the most deserving client at each instant Selection of deserving client using virtual time formalism Implemented and measured in Linux

  7. Motivation • Context: VMM for server consolidation • Multiple legacy servers share physical platform • Improved utilization and easier maintenance • Flexibility in allocating resources to virtual machines • Virtual machines typically run a single application (“appliances”)

  8. Motivation • Assumed goal: enforce predefined allocation of resources to different virtual machines (“fair share” scheduling) • Based on importance / SLA • Can change with time or due to external events Problem: what is “30% of the resources” when there are many different resources, and diverse requirements?

  9. Global Scheduling • “Fair share” usually applied to a single resource • But what if this resource is not a bottleneck? • Global scheduling idea: • Identify the system bottleneck resource • Apply fair share scheduling on this resource • This induces appropriate allocations on other resources • This paper: how to apply fair-share scheduling on any resource in the system

  10. Previous Work I: Virtual Time • Accounting is inversely proportional to allocation • Schedule the client that is farthest behind

  11. Previous Work II: Traffic Shaping • Leaky bucket • Variable requests • Constant rate transmission • Bucket represent buffer • Token bucket • Variable requests • Constant allocations • Bucket represents stored capacity

  12. Putting them Together: RSVT • “Resource sharing”: all clients make progress continuously • Generalization of processor sharing • Each job has its ideal resource sharing progress • This is considered to be the allocation ai • Grows at constant rate • Each job has its actual consumption ci • Grows only when job runs • Scheduling priority is the difference: • pi = ai – ci

  13. Example • Three clients • Allocations roughly 50%, 30%, 20% • Consumption always occur in resource time Consumed resource time Wallclock time

  14. Bookkeeping • The set of active jobs is A • The relative allocation of job i is ri • During an interval T job k has run • Update allocations: • Update consumptions:

  15. The Active Set • Active jobs (the set A) are those that can use the resource now • Allocations are relative to the active set • The active set may change • New job arrives • Job terminates • Job stops using resource temporarily • Job resumes use of resource

  16. Grace Period • Intermittent activity: process data / send packet • should retain allocations even when inactive • Thus ai continues to grow during grace period after it becomes inactive • Grace period reflects notion of continuity • Sub-second time scale

  17. Rebirth • Resumption after very long inactive periods should be treated as new arrivals • Due to grace period, job that becomes inactive accrues extra allocation • Forget this extra allocation after rebirth period • (set ai = ci) • Two order of magnitude larger than grace period

  18. Implementation • Kernel module with generic functionality • Create / destroy module • Create / destroy client • Make request / set active / set inactive • Make allocations • Dispatch • Check-in (note resource usage) • Glue code for specific subsystems • Currently networking and CPU • Plan to add disk I/O

  19. Networking Glue Code • Use the Linux QoS framework: create RSVT queueing discipline App TCP IP QoS queueing discipline NIC

  20. Networking Glue Code App • Non-RSVT traffic has priority (e.g. NFS traffic) and is counted as dead time TCP IP no RSVT? yes enqueue send immediately select and send NIC

  21. CPU Scheduling Glue Code • Use Linux modular scheduling core • Add an RSVT scheduling policy • RSVT module essentially replaces the policy runqueue • Initial implementation only for uniprocessors • CFS and possibly other policies also exist and have higher priority • When they run, this is considered dead time

  22. Timer Interrupts • Linux employs timer interrupts (250 Hz) • Allocations are done at these times • Translate time into microseconds • Subtract known dead time (unavailable to us) • Divide among active clients according to relative allocations • Bound divergence of allocation from consumption • Also handling of grace period (mark as inactive) • Also handling of rebirth (set ai = ci)

  23. Multi-Queue • At dispatch, need to find client with highest priority • But priorities change at different rates • Solution: allow only a limited discrete set of relative priorities • Each priority has a separate queue • Maintain all clients in each queue in priority order • Only need to check the first in each queue to find the maximum

  24. Experiment – Basic Allocations

  25. Experiment – Basic Allocations

  26. Experiment – Active Set

  27. Experiment – Grace Period

  28. Experiment – Rebirth

  29. Experiment – Throttling • Two competing MPlayers • The one with higher allocation does not need all of it • Allocation tracks consumption

  30. Conclusions • Demonstrated generic virtual-time based resource sharing dispatcher • Need to complete implementation • Support for I/O scheduling • More details, e.g. SMP support • Building block of global scheduling vision

More Related