1 / 67

QoS Support in Operating Systems

QoS Support in Operating Systems. Banu Özden Bell Laboratories ozden@research.bell-labs.com. Vision. Service providers will offer storage and computing services through their distributed data centers connected with high bandwidth networks to globally distributed clients.

sienna
Download Presentation

QoS Support in Operating Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QoS Support in Operating Systems Banu Özden Bell Laboratories ozden@research.bell-labs.com

  2. Vision • Service providers will offer storage and computing services • through their distributed data centers • connected with high bandwidth networks • to globally distributed clients. • Clients will access these services via diverse devices and networks, e.g.: • mobile devices and wireless networks, • high-end computer systems and high bandwidth networks. • These services will become utilities (e.g., storage utility, computing utility). • Eventually resources will be exchanged and traded between geographically dispersed data centers to address fluctuating demand.

  3. Eclipse/BSD:an Operating System with Quality of Service Support Banu Özden ozden@research.bell-labs.com

  4. Motivation • QoS support for (server) applications: • web servers • video servers • Isolation and differentiation of different • entities serviced on the same platform • applications running on the same platform • QoS requirements: • client-based • service-based • content-based

  5. Design Goals • QoS support in a general purpose operating system • Remain compatible with the underlying operating system • QoS parameters: • Isolation • Differentiation • Fairness • (Cumulative) throughput • Flexible resource management • capable of implementing a large set of provisioning needs • supports a large set of server applications without imposing significant changes to their design

  6. Talk Outline • Schedulers • Reservation File System (reservfs) • Tagging • Web Server Experiments • Access Control and Profiles • Eclipse/BSD Status • Related Work • Future Work

  7. Proportional sharing • Generalized processor sharing (GPS) weight of flow i service received by flow i in set of flows • For any flow i continuously backlogged in • Thus, rate of flow i in is:

  8. QoS Guarantees • Fairness • Throughput • Packet delay

  9. Schedulers in Eclipse • Resource characteristics differ • Different hierarchical proportional-share schedulers for resources • Link scheduler: WF2Q • Disk scheduler: YFQ • CPU scheduler: MTR-LS • Network input: SRP

  10. server server 0.8 0.2 0.4 0.2 0.4 company A company B company A page 1 company A page 2 company B 0.5 0.5 page 1 page 2 Hierarchical GPS Example hierarchical proportional sharing proportional sharing

  11. Schedulers • Hierarchical proportional-sharing (GPS) descendant queue nodes of node n serviced received by scheduler node n in set of immediate descendant nodes of the parent of node n • For any node n continuously backlogged in

  12. link scheduler link scheduler Link Aggregation • Need to incrementally scale bandwidth • Resource aggregation is emerging as a solution: • Grouping multiple resources into a single logical unit • QoS over such aggregated links?

  13. GPS MSFQ Nr Nr … r r r Multi-Server Model • Multi Server Fair Queuing (MSFQ) • A packetized algorithm for a system with N links, each with a bandwidth of r, that approximates a GPS system with a single link with Nr bandwidth Reference model Packetized scheduler

  14. Multi-Server Model (Contd.) • Goals: • Guarantee bandwidth and packet delay bounds that are independent of the number of flows • Allow flows arrive and depart dynamically • Be work-conserving • Algorithm: • When a server is idle, schedule the packet that would complete transmission earliest under a single server GPS system with a bandwidth of Nr Sigcomm 2001

  15. a1 a2 a1 a2 GPS GPS 1 2 1 2 MSFQ serv1 WFQ 1 serv 1 2 serv2 2 time = 0 1 2 3 4 time = 0 1 2 3 4 a1 a2 a3 a4 a5 a6 a7 GPS 1 2 3 4 5 6 7 … serv1 6 1 4 … 7 2 5 serv2 MSFQ 3 serv3 time = 0 1 2 3 4 5 6 7 8 9 10 MSFQ Preliminary Properties Multi-Server specific properties • Ordering: a pair of packets scheduled in the order of their GPS finishing times may complete in reverse order • GPS busy MSFQ busy, but converse is not true • Non-coinciding busy periods • Work backlog?

  16. GPS service MSFQ Packet delay time GPSi service MSFQi Service discrepancy time MSFQ Properties • Maximum service discrepancy (buffer requirement) • Maximum packet delay • Maximum per-flow service discrepancy

  17. Schedulers (contd.) • Disk scheduling with QoS • tradeoffs between QoS and total disk performance • driver queue management • queue depth • queue ordering • fragmentation • Hierarchical YFQ • CPU scheduling with QoS • length of cpu phases are not known a priori • cumulative throughput • Hierarchical MTR-LS

  18. Eclipse’s Key Elements • Hierarchical, proportional share resource schedulers • Reservation, reservation file system (reservfs) • Tagging mechanism • Access and admission control, reservation domain

  19. Reservations and Schedulers • (Resource)reservations • unit for QoS assignment • similar to the concept of a flow in packet scheduling • Hierarchical schedulers • a tree with two kinds of nodes: • scheduler nodes • queue nodes • each node corresponds to a reservation • Schedulers are dynamically reconfigurable

  20. disk bandwidth cpu cycles 0.8 0.8 0.8 0.2 0.2 0.2 company A company B company A company B 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 page 1 page 1 page 1 page 1 page 2 page 2 page 2 page 2 Web Server Example • Hosting two companies’ web sites, each with two web pages network bandwidth company A company B

  21. Web Server Video Server Application Interface Reservation file system Scheduler Interface CPU scheduler Link scheduler Disk scheduler 1 Disk scheduler 2 Net 1 Net 2 CPU 1 CPU 1 Disk1 Disk2 Disk3 Reservfs • We built the reservation file system • to create and manipulate reservations • to access and configure resource schedulers

  22. /reserv cpu fxp0 fxp1 da0 Reservfs • Hierarchical • Each reservation directory corresponds to a node at a scheduler • Each resource is represented by a reservation directory under /reserv

  23. Reservfs • Two types of reservation directories: • scheduler directories • queue directories • Scheduler directories are hierarchically expandable • Queue directories are not expandable

  24. /reserv cpu fxp0 fxp1 ca0 q0 q0 r1 q0 q0 q1 q0 share newqueue newreserv share backlog Reservfs • Scheduler directory: • share • newqueue • newreserv • special queue: q0 • Queue directory: • share • backlog

  25. CPU scheduler Link scheduler Disk scheduler Net 1 Net 2 CPU 1 Disk1 Disk2 Reservfs Web Server Video Server Application Interface: Reservation file system Scheduler Interface:

  26. Reservfs API • Creation of a new queue/scheduler reservation • fd=open(newqueue/newreserve,O_CREAT) • fd of newly created share file

  27. da0 q1 q0 q1 share newqueue newreserv share backlog Creating Queue Reservation /reserv cpu fxp0 fxp1 da0 q0 q0 r1 q0 q0 q0 q1 fd= open(“newqueue”,O_CREAT)

  28. da0 da0 q0 q1 r0 r0 q0 q1 share newqueue newreserv q0 share newreserv newqueue fd= open(“newreserv”,O_CREAT) Creating Scheduler Reservation /reserv cpu fxp0 fxp1 q0 q0 r1 q0 q0 q1

  29. Reservfs API • Changing QoS parameters • writing a weight and min value to the share file • Getting QoS parameters • reading the share file • Getting/setting queue parameters • reading/writing the backlog file

  30. Reservfs API Command line output: killerbee$ cd /reserv killerbee$ ls -al total 5 dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 . drwxr-xr-x 20 root wheel 512 Sep 12 21:54 .. dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 cpu dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 fxp0 dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 fxp1 killerbee$ cd fxp0 killerbee$ ls -alR total 6 dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 . dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .. -rw------- 1 root wheel 1 Sep 15 11:39 newqueue -rw------- 1 root wheel 1 Sep 15 11:39 newreserv dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0 -r-------- 1 root wheel 1 Sep 15 11:39 share ./q0: total 4 dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 . dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .. -rw------- 1 root wheel 1 Sep 15 11:39 backlog -rw------- 1 root wheel 1 Sep 15 11:39 share

  31. Reservfs API killerbee$ cd r0 killerbee$ ls -al total 6 dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 . dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .. -rw------- 1 root wheel 1 Sep 15 11:39 newqueue -rw------- 1 root wheel 1 Sep 15 11:39 newreserv dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0 -r-------- 1 root wheel 1 Sep 15 11:39 share killerbee$ echo “50 1000000” > newqueue killerbee$ ls -al total 6 dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 . dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .. -rw------- 1 root wheel 1 Sep 15 11:39 newqueue -rw------- 1 root wheel 1 Sep 15 11:39 newreserv dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0 dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q1 -r-------- 1 root wheel 1 Sep 15 11:39 share killerbee$ cd q1 killerbee$ ls -al total 4 dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 . dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .. -rw------- 1 root wheel 1 Sep 15 11:39 share -rw------- 1 root wheel 1 Sep 15 11:39 backlog killerbee$ cat share 50 1000000 killerbee$

  32. CPU scheduler Link scheduler Disk scheduler Net 1 Net 2 CPU 1 Disk1 Disk2 Reservfs Web Server Video Server Application Interface: Reservation file system Scheduler Interface:

  33. Reservfs Scheduler Interface • Schedulers registers by providing the following interface routines via reservfs_register(): • init(priv) • create(priv, parent, type) • start(priv, parent, type) • delete(priv, node) • get/set(priv, node, values, type)

  34. Reservfs Implementation • Built via vnode/vfs interface • A reserv{} structure represents each reservfs file • reserv{} representing a directory contains a pointer to the corresponding node at scheduler • Scheduler independent • Implements garbage collection mechanism

  35. Talk Outline • Introduction • Schedulers • Reservation File System (reservfs) • Tagging • Web Server Experiments • Access Control and Profiles • Eclipse/BSD Status • Related Work • Future Work

  36. Tagging • A request arriving at a scheduler must be associated with the appropriate reservation • Each request is tagged with a pointer to a queue node • mbuf{}, buf{} and proc{} are augmented • How is a request tagged?

  37. Tagging (contd.) • For a file, its file descriptor is tagged with a disk reservation • For a connected socket, its file descriptor is tagged with a network reservation • For unconnected sockets, we provide a late tagging mechanism • Each process is tagged with a cpu reservation • We associate reservations with references to objects

  38. Default List of a Process • Default reservations of a process, one for each resource • A list of tags (pointers to queue directories) • Used when a tag is otherwise not specified • Two new files are added for each process pid in /proc/pid • /proc/pid/default to represent the default list • /proc/pid/cdefault to represent the child default list

  39. Default List of a Process (contd.) • Reading these file returns the name of default queue directories, e.g., /reserv/cpu/q1 /reserv/fxp0/r2/q1 /reserv/da0/r1/q3 • A process, with the appropriate access rights, can change the entries of default files

  40. Implicit Tagging • The file descriptor returned by open(), accept() or connect() is automatically tagged with default • The tag of the file descriptor of an unconnected socket is set to default at sendto() and sendmesg() • When a process forks, the child process is tagged with the default cpu reservation

  41. Explicit Tagging • The tag of a file descriptor can be set/read with new commands to fcntl(): • F_SET_RES • F_GET_RES • A new system call chcpures() to change the cpu reservation of a process

  42. Reservation Domains • Permissions of a process to use, create and manipulate reservations • The reservation domain of a process is independent of its protection domain

  43. disk bandwidth network bandwidth cpu cycles 0.8 0.8 0.8 0.2 0.2 0.2 reserv A reserv B reserv A reserv B reserv A reserv B 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 reserv 1 reserv 2 reserv 1 reserv 2 reserv 1 reserv2 reserv 1 reserv2 Reservations and Reservation Domains Reservationdomain 1 Reservation domain 2

  44. Reservfs Garbage Collection • Based on reference counts • every application that is using a specific node adds a reference on it (to the vnode) • Triggered by the vnode layer • when the last application finishes using the node this is garbage collected • fcntl() available to maintain the node even if no references to it exist

  45. SRP Input Processing • Demultiples incoming packets • before network and higher-level protocol processing • Unprocesed input queue per socket • Processes input protocols in context of receiving process • Drops packets when per-socket queue is full • Avoids receive livelock

  46. Talk Outline • Introduction • Schedulers • Reservation File System (reservfs) • Tagging • Web Server Experiments • Access Control and Profiles • Eclipse/BSD Status • Related Work • Future Work

  47. QoS Support for Web Server • Virtual hosting with Apache server: • separate Apache server for each virtual host • single Apache server for all virtual hosts • Eclipse/BSD isolates and differentiates performance of virtual hosts • multiple Apache servers----implicit tagging • single Apache server----explicit tagging • We implemented an Apache module for explicit tagging

  48. Experimental Setup • Apache Web Server: • A multi-process server • (Pre)spawns helper processes • A process handles one request at a time • Each process calls accept() to service the next connection request • HTTP clients run on five different machines • Servers are running FreeBSD 2.2.8 or Eclipse/BSD 2.2.8 on a PC (266 MHz Pentium Pro, 64 MB RAM, 9 GB Seagate ST39173W fast wide SCSI disk) • Machines are connected with a 10/100 Mbps Ethernet switch

  49. /reserv cpu fxp0 da0 q0 q0 q0 q1 q1 q1 q2 q2 q2 Experiments • Hosting two sites with two servers Reservation domain of server 1 Reservation domain of server 2

  50. CPU Intensive Workload

More Related