830 likes | 1.08k Views
I/O and File Systems. CS 519: Operating System Theory Computer Science, Rutgers University Instructor: Thu D. Nguyen TA: Xiaoyan Li Spring 2002. I/O Devices. So far we have talked about how to abstract and manage CPU and memory
E N D
I/O and File Systems CS 519: Operating System Theory Computer Science, Rutgers University Instructor: Thu D. Nguyen TA: Xiaoyan Li Spring 2002
I/O Devices • So far we have talked about how to abstract and manage CPU and memory • Computation “inside” computer is useful only if some results are communicated “outside” of the computer • I/O devices are the computer’s interface to the outside world (I/O Input/Output) • Example devices: display, keyboard, mouse, speakers, network interface, and disk CS 519: Operating System Theory
Disk Basic Computer Structure CPU Memory Memory Bus (System Bus) Bridge I/O Bus NIC CS 519: Operating System Theory
Intel SR440BX Motherboard CPU System Bus & MMU/AGP/PCI Controller I/O Bus IDE Disk Controller USB Controller Another I/O Bus Serial & Parallel Ports Keyboard & Mouse CS 519: Operating System Theory
OS: Abstractions and Access Methods • OS must virtualizes a wide range of devices into a few simple abstractions: • Storage • Hard drives, Tapes, CDROM • Networking • Ethernet, radio, serial line • Multimedia • DVD, Camera, microphones • Operating system should provide consistent methods to access the abstractions • Otherwise, programming is too hard CS 519: Operating System Theory
User/OS method interface • The same interface is used to access devices (like disks and network lines) and more abstract resources like files • 4 main methods: • open() • close() • read() • write() • Semantics depend on the type of the device (block, char, net) • These methods are system calls because they are the methods the OS provides to all processes. CS 519: Operating System Theory
Unix I/O Methods • fileHandle = open(pathName, flags, mode) • a file handle is a small integer, valid only within a single process, to operate on the device or file • pathname: a name in the file system. In unix, devices are put under /dev. E.g. /dev/ttya is the first serial port, /dev/sda the first SCSI drive • flags: blocking or non-blocking … • mode: read only, read/write, append … • errorCode = close(fileHandle) • Kernel will free the data structures associated with the device CS 519: Operating System Theory
Unix I/O Methods • byteCount = read(fileHandle, byte [] buf, count) • read at most count bytes from the device and put them in the byte buffer buf. Bytes placed from 0th byte. • Kernel can give the process less bytes, user process must check the byteCount to see how many were actually returned. • A negative byteCount signals an error (value is the error type) • byteCount = write(fileHandle, byte [] buf, count) • write at most count bytes from the buffer buf • actual number written returned in byteCount • a Negative byteCount signals an error CS 519: Operating System Theory
Unix I/O Example • What’s the correct way to write a 1000 bytes? • calling: • ignoreMe = write(fileH, buffer, 1000); • works most of the time. What happens if: • can’t accept 1000 bytes right now? • disk is full? • someone just deleted the file? • Let’s work this out on the board CS 519: Operating System Theory
I/O semantics • From this basic interface, three different dimension to how I/O is processed: • blocking vs. non-blocking • buffered vs. unbuffered • synchronous vs. asynchronous • The O/S tries to support as many of these dimensions as possible for each device • The semantics are specified during the open() system call CS 519: Operating System Theory
Blocking vs. Non-Blocking I/O • Blocking: process is suspended until all bytes in the count field are read or written • E.g., for a network device, if the user wrote 1000 bytes, then the operating system would write the bytes to the device one at a time until the write() completed. • + Easy to use and understand • - if the device just can’t perform the operation (e.g. you unplug the cable), what to do? Give up an return the successful number of bytes. • Nonblocking: the OS only reads or writes as many bytes as is possible without suspending the process • + Returns quickly • - more work for the programmer (but really for robust programs)? CS 519: Operating System Theory
Buffered vs. Unbuffered I/O • Sometime we want the ease of programming of blocked I/O without the long waits if the buffers on the device are small. • buffered I/O allows the kernel to make a copy of the data • write() side: allows the process to write() many bytes and continue processing • read() side: As device signals data is ready, kernel places data in the buffer. When the process calls read(), the kernel just copies the buffer. • Why not use buffered I/O? • - Extra copy overhead • - Delays sending data CS 519: Operating System Theory
Synchronous vs. Asynchronous I/O • Synchronous I/O: the user process does not run at the same time the I/O does --- it is suspended during I/O • So far, all the methods presented have been synchronous. • Asynchronous I/O: The read() or write() call returns a small object instead of a count. • separate set of methods in unix: aio_read(), aio_write() • The user can call methods on the returned object to check “how much” of the I/O has completed • The user can also allow a signal handler to run when the the I/O has completed. CS 519: Operating System Theory
Handling Multiple I/O streams • If we use blocking I/O, how do we handle > 1 device at a time? • Example : a small program that reads from serial line and outputs to a tape and is also reading from the tape and outputting to a serial line • Structure the code like this? • while (TRUE) { • read(tape); // block until tape is ready • write(serial line);// send data to serial line • read(serial line); • write(tape); • } • Could use non-blocking I/O, but huge waste of cycles if data is not ready. CS 519: Operating System Theory
Solution: select system call • totalFds = select(nfds, readfds, writefds, errorfds, timeout); • nfds: the range (0.. nfds) of file descriptors to check • readfds: bit map of fileHandles. user sets the bit X to ask the kernel to check if fileHandle X ready for reading. Kernel returns a 1 if data can be read on the fileHandle. • writefds: bit map if fileHandle Y for writing. Operates same as read bitmap. • errorfds: bit map to check for errors. • timeout: how long to wait before un-suspending the process • totalFds = number of set bits, negative number is an error CS 519: Operating System Theory
Three Device Types • Most operating system have three device types: • Character devices • Used for serial-line types of devices (e.g. USB port) • Network devices • Used for network interfaces (E.g. Ethernet card) • Block devices: • Used for mass-storage (E.g. disks and CDROM) • What you can expect from the read/write methods changes with each device type CS 519: Operating System Theory
Character Devices • Device is represented by the OS as an ordered stream of bytes • bytes sent out the device by the write() system call • bytes read from the device by the read() system call • Byte stream has no “start”, just open and start/reading writing • The user has no control of the read/write ratio, the sender process might write() 1 time of 1000 bytes, and the receiver may have to call 1000 read calls each receiving 1 byte. CS 519: Operating System Theory
Network Devices • Like I/O devices, but each write() call either sends the entire block (packet), up to some maximum fixed size, or none. • On the receiver, the read() call returns all the bytes in the block, or none. CS 519: Operating System Theory
Block Devices • OS presents device as a large array of blocks • Each block has a fixed size (1KB - 8KB is typical) • User can read/write only in fixed sized blocks • Unlike other devices, block devices support random access • We can read or write anywhere in the device without having to ‘read all the bytes first’ • But how to specify the nth block with current interface? CS 519: Operating System Theory
The file pointer • O/S adds a concept call the file pointer • A file pointer is associated with each open file, if the device is a block device • the next read or write operates at the position in the device pointed to by the file pointer • The file pointer is measured in bytes, not blocks CS 519: Operating System Theory
Seeking in a device • To set the file pointer: • absoluteOffset = lseek(fileHandle, offset, whence); • whence specifies if the offset is absolute, from byte 0, or relative to the current file pointer position • the absolute offset is returned; negative numbers signal error codes • For devices, the offset should be a integral number of bytes relative to the size of a block. • How could you tell what the current position of the file pointer is without changing it? CS 519: Operating System Theory
Block Device Example • You want to read the 10th block of a disk • each disk block is 2048 bytes long • fh = open(/dev/sda, , , ); • pos = lseek(fh, size*count, ); • if (pos < 0 ) error; • bytesRead = read(fh, buf, blockSize); • if (bytesRead < 0) error; • … CS 519: Operating System Theory
Getting and setting device specific information • Unix has an IO ConTroL system call: • ErrorCode = ioctl(fileHandle, int request, object); • request is a numeric command to the device • can also pass an optional, arbitrary object to a device • the meaning of the command and the type of the object are device specific CS 519: Operating System Theory
Communication Between CPU and I/O Devices • How does the CPU communicate with I/O devices? • Memory-mapped communication • Each I/O device assigned a portion of the physical address space • CPU I/O device • CPU writes to locations in this area to "talk" to I/O device • I/O device CPU • Polling: CPU repeatedly check location(s) in portion of address space assigned to device • Interrupt: Device sends an interrupt (on an interrupt line) to get the attention of the CPU • Programmed I/O, Interrupt-Driven, Direct Memory Access • PIO and ID = word at a time • DMA = block at a time CS 519: Operating System Theory
Programmed I/O vs. DMA • Programmed I/O is ok for sending commands, receiving status, and communication of a small amount of data • Inefficient for large amount of data • Keeps CPU busy during the transfer • Programmed I/O memory operations slow • Direct Memory Access • Device read/write directly from/to memory • Memory device typically initiated from CPU • Device memory can be initiated by either the device or the CPU CS 519: Operating System Theory
Disk Disk Disk Programmed I/O vs. DMA CPU Memory CPU Memory CPU Memory Interconnect Interconnect Interconnect Programmed I/O DMA DMA Device Memory Problems? CS 519: Operating System Theory
Direct Memory Access • Used to avoid programmed I/O for large data movement • Requires DMA controller • Bypasses CPU to transfer data directly between I/O device and memory CS 519: Operating System Theory
Six step process to perform DMA transfer CS 519: Operating System Theory
Performance • I/O a major factor in system performance • Demands CPU to execute device driver, kernel I/O code • Context switches due to interrupts • Data copying • Network traffic especially stressful CS 519: Operating System Theory
Improving Performance • Reduce number of context switches • Reduce data copying • Reduce interrupts by using large transfers, smart controllers, polling • Use DMA • Balance CPU, memory, bus, and I/O performance for highest throughput CS 519: Operating System Theory
Device Driver • OS module controlling an I/O device • Hides the device specifics from the above layers in the kernel • Supporting a common API • UNIX: block or character device • Block: device communicates with the CPU/memory in fixed-size blocks • Character/Stream: stream of bytes • Translates logical I/O into device I/O • E.g., logical disk blocks into {head, track, sector} • Performs data buffering and scheduling of I/O operations • Structure • Several synchronous entry points: device initialization, queue I/O requests, state control, read/write • An asynchronous entry point to handle interrupts CS 519: Operating System Theory
Some Common Entry Points for UNIX Device Drivers • Attach: attach a new device to the system. • Close: note the device is not in use. • Halt: prepare for system shutdown. • Init: initialize driver globals at load or boot time. • Intr: handle device interrupt (not used). • Ioctl: implement control operations. • Mmap: implement memory-mapping (SVR4). • Open: connect a process to a device. • Read: character-mode input. • Size: return logical size of block device. • Start: initialize driver at load or boot time. • Write: character-mode output. CS 519: Operating System Theory
User to Driver Control Flow read, write, ioctl user kernel ordinary file special file file system character device block device buffer cache character queue driver_read/write driver-strategy CS 519: Operating System Theory
Buffer Cache • When an I/O request is made for a block, the buffer cache is checked first • If block is missing from the cache, it is read into the buffer cache from the device • Exploits locality of reference as any other cache • Replacement policies similar to those for VM • UNIX • Historically, UNIX has a buffer cache for the disk which does not share buffers with character/stream devices • Adds overhead in a path that has become increasingly common: disk NIC CS 519: Operating System Theory
Disks • Seek time: time to move the disk head to the desired track • Rotational delay: time to reach desired sector once head is over the desired track • Transfer rate: rate data read/write to disk • Some typical parameters: • Seek: ~10-15ms • Rotational delay: ~4.15ms for 7200 rpm • Transfer rate: 30 MB/s Sectors Tracks CS 519: Operating System Theory
Disk Scheduling • Disks are at least four orders of magnitude slower than main memory • The performance of disk I/O is vital for the performance of the computer system as a whole • Access time (seek time+ rotational delay) >> transfer time for a sector • Therefore the order in which sectors are read matters a lot • Disk scheduling • Usually based on the position of the requested sector rather than according to the process priority • Possibly reorder stream of read/write request to improve performance CS 519: Operating System Theory
Disk Scheduling Policies • Shortest-service-time-first (SSTF): pick the request that requires the least movement of the head • SCAN (back and forth over disk): good service distribution • C-SCAN (one way with fast return): lower service variability • Problem with SSTF, SCAN, and C-SCAN: arm may not move for long time (due to rapid-fire accesses to same track) • N-step SCAN: scan of N records at a time by breaking the request queue in segments of size at most N and cycling through them • FSCAN: uses two sub-queues, during a scan one queue is consumed while the other one is produced CS 519: Operating System Theory
Disk Management • Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write. • To use a disk to hold files, the operating system still needs to record its own data structures on the disk. • Partition the disk into one or more groups of cylinders. • Logical formatting or “making a file system”. • Boot block initializes system. • The bootstrap is stored in ROM. • Bootstrap loader program. • Methods such as sector sparing used to handle bad blocks. CS 519: Operating System Theory
Swap-Space Management • Swap-space — Virtual memory uses disk space as an extension of main memory. • Swap-space can be carved out of the normal file system,or, more commonly, it can be in a separate disk partition. • Swap-space management • 4.3BSD allocates swap space when process starts; holds text segment (the program) and data segment. • Kernel uses swap maps to track swap-space use. • Solaris 2 allocates swap space only when a page is forced out of physical memory, not when the virtual memory page is first created. CS 519: Operating System Theory
Disk Reliability • Several improvements in disk-use techniques involve the use of multiple disks working cooperatively. • RAID is one important technique currently in common use. CS 519: Operating System Theory
RAID • Redundant Array of Inexpensive Disks (RAID) • A set of physical disk drives viewed by the OS as a single logical drive • Replace large-capacity disks with multiple smaller-capacity drives to improve the I/O performance (at lower price) • Data are distributed across physical drives in a way that enables simultaneous access to data from multiple drives • Redundant disk capacity is used to compensate for the increase in the probability of failure due to multiple drives • Improve availability because no single point of failure • Six levels of RAID representing different design alternatives CS 519: Operating System Theory
RAID Level 0 • Does not include redundancy • Data is stripped across the available disks • Total storage space across all disks is divided into strips • Strips are mapped round-robin to consecutive disks • A set of consecutive strips that maps exactly one strip to each disk in the array is called a stripe • Can you see how this improves the disk I/O bandwidth? • What access pattern gives the best performance? stripe 0 strip 2 strip 3 strip 1 strip 0 strip 4 strip 5 strip 6 strip 7 ... CS 519: Operating System Theory
RAID Level 1 • Redundancy achieved by duplicating all the data • Every disk has a mirror disk that stores exactly the same data • A read can be serviced by either of the two disks which contains the requested data (improved performance over RAID 0 if reads dominate) • A write request must be done on both disks but can be done in parallel • Recovery is simple but cost is high strip 0 strip 1 strip 1 strip 0 strip 2 strip 3 strip 2 strip 3 ... CS 519: Operating System Theory
RAID Levels 2 and 3 • Parallel access: all disks participate in every I/O request • Small strips since size of each read/write = # of disks * strip size • RAID 2: error correcting code is calculated across corresponding bits on each data disk and stored on log(# data disks) parity disks • Hamming code: can correct single-bit errors and detect double-bit errors • Less expensive than RAID 1 but still pretty high overhead – not really needed in most reasonable environments • RAID 3: a single redundant disk that keeps parity bits • P(i) = X2(i) X1(i) X0(i) • In the event of a failure, data can be reconstructed • Can only tolerate a single failure at a time X2(i) = P(i) X1(i) X0(i) b1 b0 b2 P(b) CS 519: Operating System Theory
P(0-2) strip 0 RAID Levels 4 and 5 • RAID 4 • Large strips with a parity strip like RAID 3 • Independent access - each disk operates independently, so multiple I/O request can be satisfied in parallel • Independent access small write = 2 reads + 2 writes • Example: if write performed only on strip 0: • P’(i) = X2(i) X1(i) X0’1(i) • = X2(i) X1(i) X0’(i) X0(i) X0(i) • = P(i) X0’(i) X0(i) • Parity disk can become bottleneck • RAID 5 • Like RAID 4 but parity strips are distributed across all disks strip 2 strip 1 strip 4 strip 5 P(3-5) strip 3 CS 519: Operating System Theory
File System • File system is an abstraction of the disk • File Track/sector • To a user process • A file looks like a contiguous block of bytes (Unix) • A file system provides a coherent view of a group of files • A file system provides protection • API: create, open, delete, read, write files • Performance: throughput vs. response time • Reliability: minimize the potential for lost or destroyed data • E.g., RAID could be implemented in the OS as part of the disk device driver CS 519: Operating System Theory
File API • To read or write, need to open • open() returns a handle to the opened file • OS associates a data structure with the handle • Data structure maintains current “cursor” position in the stream of bytes in the file • Read and write takes place from the current position • Can specify a different location explicitly • When done, should close the file CS 519: Operating System Theory
Files vs. Disk Disk Files ??? CS 519: Operating System Theory
Files vs. Disk Disk Files Contiguous Layout What’s the problem with this mapping function? What’s the potential benefit of this mapping function? CS 519: Operating System Theory
Files vs. Disk Disk Files What’s the problem with this mapping function? CS 519: Operating System Theory