300 likes | 761 Views
I/O Management and Disk Scheduling. Chapter 11. I/O Driver. OS module which controls an I/O device hides the device specifics from the above layers in the OS/kernel translates logical I/O into device I/O (logical disk blocks into {track, head, sector}).
E N D
I/O Management and Disk Scheduling Chapter 11
I/O Driver • OS module which controls an I/O device • hides the device specifics from the above layers in the OS/kernel • translates logical I/O into device I/O (logical disk blocks into {track, head, sector}) • performs data buffering and scheduling of I/O operations • structure: several synchronous entry points (device initialization, queue I/O requests, state control, read/write) and an asynchronous entry point (to handle interrupts)
Typical driver structure driver_init() { } driver_interrupt(state) /* asynchronous part */ { if (state==ERROR) && (retries++<MAX) { driver_start(current_request); return; } add_current_process_to_active_queue if (! (empty(request_queue)) driver_start(get_next(request_queue)) } driver_strategy(request) { if (empty(request-queue)) driver_start(request) else add(request, request-queue) block_current_process; reschedule() } driver_start(request) { current_request= request; start_dma(request); } driver_ioctl(request) { }
User to Driver Control Flow read, write, ioctl user kernel ordinary file special file File System character device block device Buffer Cache Character queue driver_read/write driver-strategy
I/O Buffering • before an I/O request is placed the source/destination of the I/O transfer must be locked in memory • I/O buffering: data is copied from user space to kernel buffers which are pinned to memory • works for character devices (terminals), network and disks • buffer cache: a buffer in main memory for disk sectors • character queue: follows the producer/consumer model (characters in the queue are read once) • unbuffered I/O to/from disk (block device): VM paging for instance
Buffer Cache • when an I/O request is made for a sector, the buffer cache is checked first • if it is missing from the cache, it is read into the buffer cache from the disk • exploits locality of reference as any other cache • usually replacements done in chunks (a whole track can be written back at once to minimize seek time) • replacement policies are global and controlled by the kernel
Replacement policies • buffer cache organized like a stack: replace from the bottom • LRU: replace the block that has been in the cache longest with no reference to it (on reference a block is moved to the top of the stack) • LFU: replace the block with the fewest references (counters which are incremented on reference and blocks move accordingly) • frequency-based replacement: define a new section on the top of the stack, counter is unchanged while the block is in the new section
Least Recently Used • The block that has been in the cache the longest with no reference to it is replaced • The cache consists of a stack of blocks • Most recently referenced block is on the top of the stack • When a block is referenced or brought into the cache, it is placed on the top of the stack
Least Frequently Used • The block that has experienced the fewest references is replaced • A counter is associated with each block • Counter is incremented each time block accessed • Block with smallest count is selected for replacement • Some blocks may be referenced many times in a short period of time and then not needed any more
Application-controlled File Caching • two-level block replacement: responsibility is split between kernel and user level • a global allocation policy performed by the kernel which decides which process will give up a block • a block replacement policy decided by the user: • kernel provides the candidate block as a hint to the process • the process can overrule the kernel’s choice by suggesting an alternative block • the suggested block is replaced by the kernel • examples of alternative replacement policy: most-recently used (MRU)
Sound kernel-user cooperation • oblivious processes should do no worse than under LRU • foolish processes should not hurt other processes • smart processes should perform better than LRU whenever possible and they should never perform worse • if kernel selects block A and user chooses B instead, the kernel swaps the position of A and B in the LRU list and places B in a “placeholder” which points to A (kernel’s choice) • if the user process misses on B (i.e. he made a bad choice), and B is found in the placeholder, then the block pointed to by the placeholder is chosen (prevents hurting other processes)
Disk Performance Parameters • To read or write, the disk head must be positioned at the desired track and at the beginning of the desired sector • Seek time • time it takes to position the head at the desired track • Rotational delay or rotational latency • time its takes for the beginning of the sector to reach the head
Disk Performance Parameters • Access time • Sum of seek time and rotational delay • The time it takes to get in position to read or write • Data transfer occurs as the sector moves under the head
Disk I/O Performance • disks are at least four orders of magnitude slower than the main memory • the performance of disk I/O is vital for the performance of the computer system as a whole • disk performance parameters • seek time (to position the head at the track): 20 ms • rotational delay (to reach the sector): 8.3 ms • transfer time: 1-2 MB/sec • access time (seek time+ rotational delay) >> transfer time for a sector • therefore the order in which sectors are read matters a lot
Disk Scheduling Policies • Seek time is the reason for differences in performance • For a single disk there will be a number of I/O requests • If requests are selected randomly, we will get the worst possible performance
Disk Scheduling Policies • First-in, first-out (FIFO) • Process request sequentially • Fair to all processes • Approaches random scheduling in performance if there are many processes
Disk Scheduling Policies • Priority • Goal is not to optimize disk use but to meet other objectives • Short batch jobs may have higher priority • Provide good interactive response time
Disk Scheduling Policies • Last-in, first-out • Good for transaction processing systems • The device is given to the most recent user so there should be little arm movement • Possibility of starvation since a job may never regain the head of the line
Disk Scheduling Policies • Shortest Service Time First • Select the disk I/O request that requires the least movement of the disk arm from its current position • Always choose the minimum Seek time
Disk Scheduling Policies • SCAN • Arm moves in one direction only, satisfying all outstanding requests until it reaches the last track in that direction • Direction is reversed
Disk Scheduling Policies • C-SCAN • Restricts scanning to one direction only • When the last track has been visited in one direction, the arm is returned to the opposite end of the disk and the scan begins again
Disk Scheduling Policies • N-step-SCAN • Segments the disk request queue into subqueues of length N • Subqueues are process one at a time, using SCAN • New requests added to other queue when queue is processed • FSCAN • Two queues • One queue is empty for new request
Disk Scheduling Policies • usually based on the position of the requested sector rather than according to the process priority • shortest-service-time-first (SSTF): pick the request that requires the least movement of the head • SCAN (back and forth over disk): good distribution • C-SCAN(one way with fast return):lower service variability but head may not be moved for a considerable period of time • N-step SCAN: scan of N records at a time by breaking the request queue in segments of size at most N • FSCAN: uses two subqueues, during a scan one queue is consumed while the other one is produced
RAID • Redundant Array of Independent Disks (RAID) • idea: replace large-capacity disks with multiple smaller-capacity drives to improve the I/O performance • RAID is a set of physical disk drives viewed by the OS as a single logical drive • data are distributed across physical drives in a way that enables simultaneous access to data from multiple drives • redundant disk capacity is used to compensate the increase in the probability of failure due to multiple drives • size RAID levels (design architectures)
RAID Level 0 • does not include redundancy • data is stripped across the available disks • disk is divided into strips • strips are mapped round-robin to consecutive disks • a set of consecutive strips that map exactly one strip to each array member is called stripe strip 2 strip 3 strip 1 strip 0 strip 4 strip 5 strip 6 strip 7 ...
RAID Level 1 • redundancy achieved by duplicating all the data • each logical disk is mapped to two separate physical disks so that every disk has a mirror disk that contains the same data • a read can be serviced by either of the two disks which contains the requested data (improved performance over RAID 0 if reads dominate) • a write request must be done on both disks but can be done in parallel • recovery is simple but cost is high strip 0 strip 1 strip 1 strip 0 strip 2 strip 3 strip 2 strip 3 ...
RAID Levels 2 and 3 • parallel access: all disks participate in every I/O request • small strips (byte or word size) • RAID 2: error correcting code (Hamming) is calculated across corresponding bits on each data disk and stored on log(data) parity disks; necessary only if error rate is high • RAID 3: a single redundant disk which keeps the parity bit P(i) = X2(i) + X1(i) + X0(i) • in the event of failure, data can be reconstructed but only one request at the time can be satisfied X2(i) = P(i) + X1(i) + X0(i) b1 b0 b2 P(b) ...
RAID Levels 4 and 5 • independent access: each disk operates independently, so multiple I/O request can be satisfied in parallel • large strips • RAID 4: for small writes: 2 reads + 2 writes example: if write performed only on strip 0: P’(i) = X2(i) + X1(i) + X0’1(i) = X2(i) + X1(i) + X0’(i) + X0(i) + X0(i) = P(i) + X0’(i) + X0(i) • RAID 5: parity strips are distributed across all disks strip 2 P(0-2) strip 1 strip 0 strip 3 strip 4 strip 5 P(3-5)