930 likes | 1.07k Views
CS 519: Lecture 4. I/O, Disks, and File Systems. I/O Devices. So far we have talked about how to abstract and manage CPU and memory Computation “ inside ” computer is useful only if some results are communicated “ outside ” of the computer
E N D
CS 519: Lecture 4 I/O, Disks, and File Systems
I/O Devices • So far we have talked about how to abstract and manage CPU and memory • Computation “inside” computer is useful only if some results are communicated “outside” of the computer • I/O devices are the computer’s interface to the outside world (I/O Input/Output) • Example devices: display, keyboard, mouse, speakers, network interface, and disk CS 519: Operating System Theory
Disk Basic Computer Structure CPU Memory Memory Bus (System Bus) Bridge I/O Bus NIC CS 519: Operating System Theory
OS: Abstractions and Access Calls • OS must virtualize a wide range of devices into a few simple abstractions: • Storage • Hard drives, tapes, CDROM • Networking • Ethernet, radio, serial line • Multimedia • DVD, camera, microphones • Operating system should provide consistent calls to access the abstractions • Otherwise, programming is too hard CS 519: Operating System Theory
User/OS Interface • Same interface is used to access devices (like disks and network cards) and more abstract resources (like files) • 4 main calls: • open() • close() • read() • write() • Semantics depend on the type of the device (block, char, net) CS 519: Operating System Theory
Unix I/O Calls • fileHandle = open(pathName, flags, mode) • a file handle is a small integer, valid only within a single process, to operate on the device or file • pathname: a name in the file system. In Unix, devices are put under /dev. E.g. /dev/ttya is the first serial port, /dev/sda the first SCSI drive • flags: blocking or non-blocking … • mode: read only, read/write, append … • errorCode = close(fileHandle) • Kernel will free the data structures associated with the device CS 519: Operating System Theory
Unix I/O Calls • byteCount = read(fileHandle, buf, count) • read at most count bytes from the device and put them in the byte buffer buf. Bytes placed from 0th byte. • Kernel can give the process fewer bytes, user process must check the byteCount to see how many were actually returned. • A negative byteCount signals an error (value is the error type) • byteCount = write(fileHandle, buf, count) • write at most count bytes from the buffer buf • actual number written returned in byteCount • a negative byteCount signals an error CS 519: Operating System Theory
I/O Semantics • From this basic interface, two different dimensions to how I/O is processed: • blocking vs. non-blocking vs. asynchronous • buffered vs. unbuffered • The OS tries to support as many of these dimensions as possible for each device • The semantics are specified in the open() system call CS 519: Operating System Theory
Blocking vs. Non-blocking vs. Asynchronous I/O • Blocking – process is blocked until all bytes in the count field are read or written • E.g., for a network device, if the user wrote 1000 bytes, then the OS would only unblock the process after the write() call completes. • + Easy to use and understand • - If the device just can’t perform the operation (e.g. you unplug the cable), what to do? Give up an return the successful number of bytes. • Non-blocking – the OS only reads or writes as many bytes as is possible without blocking the process • + Returns quickly • - More work for the programmer (but really good for robust programs) CS 519: Operating System Theory
Blocking vs. Non-blocking vs. Asynchronous I/O • Asynchronous – similar to non-blocking I/O. The I/O call returns immediately, without waiting for the operation to complete. I/O subsystem signals the process when I/O is done. Same advantages and disadvantages of non-blocking I/O. • Difference between non-blocking and asynchronous I/O: a non-blocking read() returns immediately with whatever data available; an asynchronous read() requests a transfer that will be performed in its entirety, but that will complete at some future time. CS 519: Operating System Theory
Buffered vs. Unbuffered I/O • Sometimes we want the ease of programming of blocking I/O without the long waits if the buffers on the device are small. • Buffered I/O allows the kernel to make a copy of the data and adjust to different device speeds. • write(): allows the process to write bytes and continue processing • read(): as device signals data is ready, kernel places data in the buffer. When process calls read(), the kernel just makes a copy. • Why not use buffered I/O? • -- Extra copy overhead • -- Delays sending data CS 519: Operating System Theory
Getting Back to Device Types • Most OSs have three device types (in terms of transfer modes): • Character devices • Used for serial-line types of devices (e.g., USB port) • Block devices • Used for mass-storage (e.g., disks and CDROM) • Network devices • Used for network interfaces (e.g., Ethernet card) • What you can expect from the read/write calls changes with each device type CS 519: Operating System Theory
Character Devices • Device is represented by the OS as an ordered stream of bytes • bytes sent out to the device by the write system call • bytes read from the device by the read system call • Byte stream has no “start”, just open and start reading/writing CS 519: Operating System Theory
Block Devices • OS presents device as a large array of blocks • Each block has a fixed size (1KB - 8KB is typical) • User can read/write only in fixed-size blocks • Unlike other devices, block devices support random access • We can read or write anywhere in the device without having to ‘read all the bytes first’ CS 519: Operating System Theory
Network Devices • Like block-based I/O devices, but each write call either sends the entire block (packet), up to some maximum fixed size, or none. • On the receiver, the read call returns all the bytes in the block, or none. CS 519: Operating System Theory
Random Access: The File Pointer • For random access in block devices, OS adds a concept called the file pointer • A file pointer is associated with each open file or device, if the device is a block device • The next read or write operates at the position in the device pointed to by the file pointer • The file pointer points to bytes, not blocks CS 519: Operating System Theory
The Seek Call • To set the file pointer: • absoluteOffset = lseek(fileHandle, offset, from); • from specifies if the offset is absolute, from byte 0, or relative to the current file pointer position • The absolute offset is returned; negative numbers signal error codes • For devices, the offset should be a integral number of bytes. CS 519: Operating System Theory
Block Device Example • You want to read the 10th block of a disk • Each disk block is 4096 bytes long • fh = open(/dev/sda, , , ); • pos = lseek(fh, 4096*9, 0); • if (pos < 0) error; • bytesRead = read(fh, buf, 4096); • if (bytesRead < 0) error; • … CS 519: Operating System Theory
Getting and Setting Device-Specific Info • Unix has an I/O control system call: • ErrorCode = ioctl(fileHandle, request, object); • request is a numeric command to the device • Can also pass an optional, arbitrary object to a device • The meaning of the command and the type of the object are device-specific CS 519: Operating System Theory
Programmed I/O vs. DMA • Programmed I/O is ok for sending commands, receiving status, and communication of a small amount of data • Inefficient for large amount of data • Keeps CPU busy during the transfer • Programmed I/O memory operations slow • Direct Memory Access • Device read/write directly from/to memory • Memory device typically initiated from CPU • Device memory can be initiated by either the device or the CPU CS 519: Operating System Theory
Direct Memory Access • Used to avoid programmed I/O for large data movement • Requires DMA controller • Bypasses CPU to transfer data directly between I/O device and memory CS 519: Operating System Theory
Disk Disk Disk Programmed I/O vs. DMA CPU Memory CPU Memory CPU Memory Interconnect Interconnect Interconnect Programmed I/O DMA DMA Device Memory Problems? CS 519: Operating System Theory
Six Steps to Perform DMA Transfer Source: SGG CS 519: Operating System Theory
driver Life Cycle of a Blocking I/O Request Naïve processing of blocking request: device driver executed by a dedicated kernel thread; only one I/O can be processed at a time. More sophisticated approach would not block the device driver and would not require a dedicated kernel thread. Source: SGG CS 519: Operating System Theory
Life Cycle of a Blocking I/O Request Source: SGG CS 519: Operating System Theory
Performance • I/O a major factor in system performance • Demands CPU to execute device driver, kernel I/O code • State save/restore due to interrupts • Data copying • Disk I/O is extremely slow CS 519: Operating System Theory
Improving Performance • Reduce number of context switches • Reduce data copying • Reduce interrupts by using large transfers, smart controllers, polling • Use DMA • Balance CPU, memory, bus, and I/O performance for highest throughput CS 519: Operating System Theory
Device Driver • OS module controlling an I/O device • Hides the device specifics from the above layers in the kernel • Supporting a common API • UNIX: block or character device • Block: device communicates with the CPU/memory in fixed-size blocks • Character/Stream: stream of bytes • Translates logical I/O into device I/O • E.g., logical disk blocks into {head, track, sector} • Performs data buffering and scheduling of I/O operations • Structure • Several synchronous entry points: device initialization, queue I/O requests, state control, read/write • An asynchronous entry point to handle interrupts CS 519: Operating System Theory
Some Common Entry Points for UNIX Device Drivers • Attach: attach a new device to the system. • Close: note the device is not in use. • Halt: prepare for system shutdown. • Init: initialize driver globals at load or boot time. • Intr: handle device interrupt. • Ioctl: implement control operations. • Mmap: implement memory-mapping. • Open: connect a process to a device. • Read: character-mode input. • Size: return logical size of block device. • Start: initialize driver at load or boot time. • Write: character-mode output. CS 519: Operating System Theory
User to Driver Control Flow read, write, ioctl user kernel ordinary file special file file system character device block device buffer cache character queue driver driver CS 519: Operating System Theory
Buffer Cache • When an I/O request is made for a block, the buffer cache is checked first • If block is missing from the cache, it is read into the buffer cache from the device • Exploits locality of reference as any other cache • Replacement policies similar to those for VM, but LRU is feasible • UNIX • Historically, UNIX has a buffer cache for the disk which does not share buffers with character/stream devices • Adds overhead in a path that has become increasingly common: disk NIC CS 519: Operating System Theory
Disks • Seek time: time to move the disk head to the desired track • Rotational delay: time to reach desired sector once head is over the desired track • Transfer rate: rate data read/write to disk • Some typical parameters: • Seek: ~2-10ms • Rotational delay: ~3ms for 10000 rpm • Transfer rate: 200 MB/s Sectors Tracks CS 519: Operating System Theory
Disk Scheduling • Disks are at least four orders of magnitude slower than main memory • The performance of disk I/O is vital for the performance of the computer system as a whole • Access time (seek time + rotational delay) >> transfer time for a sector • Therefore the order in which sectors are read matters a lot • Disk scheduling • Usually based on the position of the requested sector rather than according to the process priority • Possibly reorder stream of read/write request to improve performance CS 519: Operating System Theory
Disk Scheduling (Cont.) • Several algorithms exist to schedule the servicing of disk I/O requests. • We illustrate them with a request queue (tracks 0-199). • 98, 183, 37, 122, 14, 124, 65, 67 • Head pointer 53 CS 519: Operating System Theory
FCFS Illustration shows total head movement of 640 cylinders. Source: SGG CS 519: Operating System Theory
SSTF • Selects the request with the minimum seek time from the current head position. • SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests. • Illustration shows total head movement of 236 cylinders. CS 519: Operating System Theory
SSTF (Cont.) Source: SGG CS 519: Operating System Theory
SCAN • The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. • Sometimes called the elevator algorithm. • Illustration shows total head movement of 208 cylinders. CS 519: Operating System Theory
SCAN (Cont.) Source: SGG CS 519: Operating System Theory
C-SCAN • Provides a more uniform wait time than SCAN. • The head moves from one end of the disk to the other, servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. • Treats the cylinders as a circular list that wraps around from the last cylinder to the first one. CS 519: Operating System Theory
C-SCAN (Cont.) Source: SGG CS 519: Operating System Theory
C-LOOK • Version of C-SCAN • Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk. CS 519: Operating System Theory
C-LOOK (Cont.) Source: SGG CS 519: Operating System Theory
Disk Scheduling Policies • Shortest-service-time-first (SSTF): pick the request that requires the least movement of the head • SCAN (back and forth over disk): good service distribution • C-SCAN (one way with fast return): lower service variability • Problem with SSTF, SCAN, and C-SCAN: arm may not move for long time (due to rapid-fire accesses to same track) • N-step SCAN: scan of N records at a time by breaking the request queue in segments of size at most N and cycling through them • FSCAN: uses two sub-queues, during a scan one queue is consumed while the other one is produced CS 519: Operating System Theory
Disk Management • Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write. • To use a disk to hold files, the operating system still needs to record its own data structures on the disk. • Partition the disk into one or more groups of cylinders. • Logical formatting or “making a file system”. • Boot block initializes system. • The bootstrap is stored in ROM. • Bootstrap loader program. • Methods such as sector sparing used to handle bad blocks. CS 519: Operating System Theory
Swap Space Management • Virtual memory uses disk space as an extension of main memory. • Swap space is necessary for pages that have been written and then replaced from memory. • Swap space can be carved out of the normal file system, or, more commonly, it can be in a separate disk partition. • Swap space management • 4.3BSD allocates swap space when process starts; holds text segment (the program) and data segment. (Swap and heap pages are created in main memory first.) • Kernel uses swap maps to track swap space use. • Solaris 2 allocates swap space only when a page is forced out of physical memory, not when the virtual memory page is first created. CS 519: Operating System Theory
Disk Reliability • Several improvements in disk-use techniques involve the use of multiple disks working cooperatively. • RAID is one important technique currently in common use. CS 519: Operating System Theory
RAID • Redundant Array of Inexpensive Disks (RAID) • A set of physical disk drives viewed by the OS as a single logical drive • Replace large-capacity disks with multiple smaller-capacity drives to improve the I/O performance (at lower price) • Data are distributed across physical drives in a way that enables simultaneous access to data from multiple drives • Redundant disk capacity is used to compensate for the increase in the probability of failure due to multiple drives • Improve availability because no single point of failure • Six levels of RAID representing different design alternatives CS 519: Operating System Theory
RAID Level 0 • Does not include redundancy • Data is stripped across the available disks • Total storage space across all disks is divided into strips • Strips are mapped round-robin to consecutive disks • A set of consecutive strips that maps exactly one strip to each disk in the array is called a stripe • Can you see how this improves the disk I/O bandwidth? • What access pattern gives the best performance? stripe 0 strip 2 strip 3 strip 1 strip 0 strip 4 strip 5 strip 6 strip 7 ... CS 519: Operating System Theory
RAID Level 1 • Redundancy achieved by duplicating all the data • Every disk has a mirror disk that stores exactly the same data • A read can be serviced by either of the two disks which contains the requested data (improved performance over RAID 0 if reads dominate) • A write request must be done on both disks but can be done in parallel • Recovery is simple but cost is high strip 8 strip 8 strip 0 strip 0 strip 1 strip 9 strip 9 strip 1 ... CS 519: Operating System Theory