750 likes | 868 Views
CS 1104 Help Session III I/O and Buses. Colin Tan, ctank@comp.nus.edu.sg S15-04-15. Why do we need I/O Devices?. No computer is an island Need to interact with people and other computers and devices.
E N D
CS 1104Help Session IIII/O and Buses Colin Tan, ctank@comp.nus.edu.sg S15-04-15
Why do we need I/O Devices? • No computer is an island • Need to interact with people and other computers and devices. • People don’t speak binary, so need keyboards and display units that “speak” English (or Chinese etc.) to interface with people. • Communicating with other computers presents its own problems. • Synchronization • Erroneous transmission/reception • Error and failure recovery (e.g. Other computer died while talking)
The Need for I/O • Also, there is a need for permanent storage: • Main memory, cache etc are volatile devices. Contents lost when power is lost. • So we have hard-disks to store files, data, etc. • Impractical to build all of these into the CPU • Too complex, too big to put on CPU • The varying needs of users mean that the types of I/O devices will vary. • Office user will need laser printer, but ticketing officer needs dot-matrix printer.
Devices We Will Be Looking At • Disk Drives • Data is organized into fixed-size blocks. • Blocks are organized in concentric circles called tracks. The tracks are created on both sides of metallic disks called “platters”. • The corresponding tracks on each side of each platter form a cylinder (e.g. Track 0 on side 0 of platter 0, track 0 on side 1 of platter 0, track 0 on side 0 of platter 1 etc.) • Latencies are involved in finding the data and reading or writing it.
Devices We Will Be Looking At • Network Devices • Transmits data between computers. • Data often organized in blocks called “packets”, or sometimes into frames. • Latencies are involved in sending/receiving data over a network.
Devices We Will Be Looking At • Buses • Buses carry data and control signals from CPU to/from memory and I/O devices. • DMA Controller (DMAC) • The DMAC performs transfers between I/O devices and memory without CPU intervention.
Disk Drives • Latencies involved in accessing drives: • Controller overheads: To read or write a drive, a request must be made to the drive controller. The drive controller may take some time to respond. This delay is called the “controller overhead”, and is usually ignored. • Controller overhead time also consists of delays introduced by controller circuitry in transferring data. • Head-Selection Time:Each side of each platter has a head. To read the disk, first select which side of which platter to read/write by activating its head and de-activating the other heads. Normally this time is ignored.
Disk Drives • Latencies involved in accessing drives: • Seek Time:Once the correct head has been selected, it must be moved over the correct track. The time taken to do this is called the “seek time”, and is usually between 8 to 20 ms (NOT NEGLIGIBLE!) • Rotational Latency: Even when the head is over the correct track, it must wait for the block it wants to read to rotate by. • The average rotational latency is T/2, where T is the period (in seconds) of the rotation speed R (60/R if R is specified in RPM, or 1/R if R is specified in RPS)
Disk Drives • Latencies involved in accessing drives: • Transfer Time: This is the time taken to actually read the data. If the throughput of the drive is given as X MB/s and we want to read Y bytes of data, then the transfer time is given by: Y/(X * 10^6)
Example • A program is written to access 3 blocks of data (the blocks are not contiguous and may exist anywhere on the disk) from a disk with rotation speed of 7200 rpm, 12ms seek time, throughput of 10 MB/S and a block size of 16 KB. Compute the worst case timing for accessing the 3 blocks.
Example • Analysis: • Each block can be anywhere on the disk • In the worst case, we must incur seek, rotational and transfer delays for every block. • What is the timing for each delay? • Controller Overhead - Negligible (since not given) • Head-switching time - Negligible (since not given) • Seek time • Rotational Latency • Transfer time. • How many times are each of these delays incurred?
Example • A disk drive has a rotational speed of 7200 rpm. Each block is 16KB, and there are 16 blocks per track. There are 22 platters with 25 tracks each. The average seek time is 12ms. • What is the capacity of this disk? • How long does it take to read 1 block of data?
Example • Analysis • Size: • How many sides are there? How many tracks per side? How many blocks per track? How big is each block? • Time to read 1 block • Throughput is not given. How to work it out?
Network Devices • Some major network types in use today: • Ethernet • The most common networking technology • Poor performance under high traffic. • FDDI - uses laser and fibre optic technology to transmit data • Fast, expensive. • Slowly being replaced by gigabit ethernets. • Asynchronous Transfer Mode (ATM) • Fast throughput by using simple and fast components • Very expensive. • Example of daily ATM use: Singtel Magix (ADSL)
Ethernet Packet Format Data Pad Check Preamble Dest Addr Src Addr 8 Bytes 6 Bytes 6 Bytes 0-1500B 0-46B 4B Length of Data2 Bytes • Preamble to recognize beginning of packet • Unique Address per Ethernet Network Interface Card so can just plug in & use • Pad ensures minimum packet is 64 bytes • Easier to find packet on the wire • Header+ Trailer: 24B + Pad
Software Protocol to Send and Receive • SW Send steps • 1: Application copies data to OS buffer • 2: OS calculates checksum, starts timer • 3: OS sends data to network interface HW and says start • SW Receive steps • 3: OS copies data from network interface HW to OS buffer • 2: OS calculates checksum, if OK, send ACK; if not, delete message(sender resends when timer expires) • 1: If OK, OS copies data to user address space, & signals application to continue
Network Devices • Latencies Involved: • Interconnect Time: This is the time taken for 2 stations to “hand-shake” and establish a communications session • Hardware Latencies: There is some latency in gaining access to a medium (e.g. In Ethernet the Network Interface Card (NIC) must wait for the Ethernet cable to be free of other activity) and in reading/writing to the medium. • Software Latencies: Network access often requires multiple buffer copying operations, leading to delays.
Network Devices • Latencies Involved: • Propagation Delays: For very large networks stretching thousands of miles, signals do not reach their destination immediately, and take some time to travel in the wire. More details in CS2105. • Switching Delays: Large networks often have intermediate switches to receive and re-transmit data (to restore signal integrity, for routing etc.). These switches introduce delays too. More details in CS2105.
Network Devices • Latencies Involved: • Data Transfer Time: Time taken to actually transfer the data. If we wish to transfer Y bytes of data over a network link with a throughput of X MBPS, the data transfer time is given by: (Y bytes)/(X * 10^6) • Aside from the Data Transfer Time (where real useful work is actually being done), all of the other latencies do not accomplish anything useful (but are still necessary), and these are termed “overheads”.
Network Devices • Note that if the overheads are much larger than the data transfer time, it is possible for a slow network with low overheads to perform better than a fast network with high overheads. • E.g. Page 654 of Patterson & Hennessy.
Example • A communications program was written and profiled, and it was found that it takes 40ns to copy data to and from the network. It was also found that it takes 100ns to establish a connection, and that effective throughput was 5 MBPS. Compute how long it takes to send a 32KB block of data over the network.
Example • Analysis: • What are the overheads? What is the data transfer time?
Buses • Buses are extremely important devices (essentially they’re groups of wires) that bring data and control signals from one part of a system to another. • Categories of bus lines: • Control Lines: These carry control signals like READ/WRITE signals and the CLK signal. • Address Lines: These contain identifiers of devices to read/write from, or addresses of memory locations to access.
Buses • Data Lines: These actually carry the data we want to transfer • Sometimes the data/address lines are multiplexed onto the same set of lines. This allows us to build cheaper but slower buses • Must alternate between sending addresses and sending data, instead of spending all the time sending data.
Types of Buses • 3 Broad Category of Buses • CPU/Memory Bus: These are very fast (100 MHz or more), very short buses that connect the CPU to the cache system and the cache system to the main memory. If the cache is on-chip, then it connects the CPU to the main memory. • I/O Bus: The I/O bus connects I/O devices to the CPU/Memory Bus, and is often very slow (12 MHz to 66 MHz). • Backplane Bus: The backplane bus is a mix of the 2, and often CPU, memory and I/O devices all connect to the same backplane bus.
Combining Bus Types • Can have several schemes: • 1 bus system: CPU, memory, I/O devices all connected to the memory bus. • 2 bus system: CPU, memory connected via memory bus, and I/O connected via I/O bus. • 3 bus system: CPU and memory connected via memory bus, I/O connected via small set of backplane buses.
1-Bus System • 1-bus system: CPU, memory and I/O share single bus. • Bad bad bad - I/O very slow, slows down the memory bus. • Affects performance of memory accesses and hence overall CPU performance.
2-Bus System • 2-bus system: CPU and memory communicate via memory bus. • I/O devices send data via I/O bus.
2-Bus System • I/O Bus is de-coupled from memory bus by I/O controller. • I/O controller will coordinate transfers between the fast memory bus and the slow I/O bus. • Buffers data between buses so no data is lost. • Arbitrates for memory bus if necessary. • In the notes, the I/O controller is called a “Bus Adaptor”. Both words mean the same thing.
3-Bus System • Memory and CPU still connected directly • This is important because it allows fast CPU/memory interaction.
3-Bus System • A backplane bus interfaces with the memory bus via a Bus Adapter. • Backplane buses typically have very high bandwidth, • Not quite as high as memory bus though. • Multiple I/O buses interface with the backplane bus. • Possible to have devices on different I/O buses communicating with each other, with the CPU completely uninvolved! • Very efficient I/O transfers possible.
Synchronous vs Asynchronous • Synchronous buses: Operations are coordinated based on a common clock. • Asynchronous buses: Operations are coordinated based on control signals.
Synchronous Example(Optional) • A typical memory system works in the following way: • Addresses are first placed on the address bus. • After a delay of 1 cycle (the hold time), the READ signal is asserted. • After 4 cycles, the data will become available on the data lines. • The data remains available for 2 cycles after the READ signal is de-asserted, during which time no new read operations may be performed.
Synchronous Example(Optional) CLK ADDR READ DATA
Synchronous Example(Optional) • Given that the synchronous bus in the previous example is operating at 200MHz, and that the time taken to read 1 word (4 bytes) of data from the DATA bus is 40ns, compute the maximum memory read bandwidth for this bus (assume that the READ line is dropped only after reading the data). Assume also that the time taken to place the address on the address bus is negligible.
Synchronous Example(Optional) • Analysis: • How long is each clock cycle in seconds or ns? • How long does it take to set up the read? (put address on address bus, assert the READ signal, wait for the data to appear, read the data, de-assert the READ signal) • How long does it take before you can repeat the READ operation? • Therefore, in 1 second, how many bytes of data can you read?
Asynchronous Bus Example(Optional) • Asynchronous buses use a set of request/grant lines to perform data transfers instead of a central clock. • E.g. Suppose CPU wants to write to memory • 1. CPU will make a request by asserting the MEMW line. • 2. Memory sees MEMW line asserted, and knows that CPU wants to write to memory. It asserts a WGNT line to indicate the CPU may proceed with the write. • 3. CPU sees the WGNT line asserted, and begins writing.
Asynchronous Bus Example(Optional) • 4. When CPU has finished writing, it de-asserts the MEMW line. • 5. Memory sees MEMW line de-asserted, and knows that CPU has completed writing. • 6. In response, memory de-asserts the WGNT line. CPU sees WGNT line de-asserted, and knows that memory understands that writing is complete.
Asynchronous vs. SynchronousA Summary • Asynchronous Buses • Coordination is based on the status of control lines (MEMW, WGNT in our example). • Timing is not very critical. Devices can work as fast or as slow as they want without worrying about timing. • More difficult to design and build devices for async buses. • Need good understanding of protocol. • Synchronous Buses • Coordination is based on a central clock. • Timing is CRITICAL. If a device exceeds or goes below the specified number of clock cycles, system will fail (“clock skewing”). • However synchronous buses are fast, and simpler to design devices for it.
Bus Arbitration • Often more than one device is trying to gain access to a bus: • A CPU and a DMA controller may both be trying to use the CPU-Memory bus. • Only 1 device can use the bus each time, so need a way to arbitrate who gets to use it. • Buses are common set of wires shared by many devices. • If >1 device tries to access the bus at the same time, there will be collisions and the data sent/received along the bus will be corrupted beyond recovery. • Solve by prioritizing: If n devices need to use the bus, the one with the highest priority will use it.
Bus Arbitration • Bus arbitration may be done in a co-operative way (each device knows and co-operates in determining who has higher priority) • No single point of failure • Complicated • May also have a central arbiter to make decisions • Easier to implement • Bottleneck, single point of failure.
Central Arbitration • Devices wishing to use the bus will send a request to the controller. • The controller will decide which device can use the bus, and assert its grant line (GNTx) to tell it.
Distributed Arbitration • Devices can also decide amongst themselves who should use the bus. • Every device knows which other devices are requesting. • Each device will use an algorithm to collectively agree who will use the bus. • The device that wins will assert its GNTx line to show that it knows that it has won and will proceed to use the bus.
Arbitration Schemes • Round Robin (Centralized or Distributed Arbitration) • Arbiter keeps record of which device last had the highest priority to use the bus. • If dev0 had the highest priority, on the next request cycle dev1 will have the highest priority, then dev2 all the way to devn, and it begins again with dev0.
Arbitration Schemes • Daisy Chain (Usually centralized arbitration) • Only 1 request and 1 grant line. • Request lines are relayed to the bus controller through the intervening devices. • If the bus controller sees a request, it will assert the GNT line
Arbitration Schemes • The GNT line is again relayed through intervening devices, until it finally reaches the requesting device, and the device can now use the bus. • If an intervening device also needs the bus, it can hijack the GNT signal and use the bus, instead of relaying it on to the downstream requesting device. • E.g. If both Dev3 and Dev1 request for the bus, the controller will assert GNT. Dev1 will hijack the GNT and use the bus instead of passing the GNT on to Dev3. • Devices closer to the arbiter have higher priority. • Possible to starve lower-priority devices.
Arbitration Schemes • Collision Detection • This scheme is used in Ethernet, the main LAN technology that connects computers together. • Properly called “Carrier Sense Multiple Access with Collision Detection”, or CSMA/CD. • In such schemes, all devices (“stations”) have permanent and continuous access to the bus:
Arbitration Schemes • CSMA/CD Algorithm • Suppose a station A wishes to transmit: • Check bus, and see if any station is transmitting. • If no, transmit. If yes, wait until bus becomes free. • Once free, start transmitting. While transmitting, listen to the bus for collisions. • Collisions can be detected by a sudden increase in the average bus voltage level. • Collisions occur when at least 2 stations A and B see that the bus is free, and begin transmitting together. • In event of a collision: • All stations stop transmitting immediately. • All stations wait a random amount of time, test bus, and restart transmission if free.
Arbitration Schemes • Advantages: • Completely distributed arbitration, little coordination between stations needed. • Very good performance under light traffic (few stations transmitting. • Disadvantages • Performs degrades exponentially relative to number of stations transmitting • If many stations wish to transmit together, there will be many collisions and stations will need to resend data repeatedly. • At worst case, effective throughput can fall to 0.
Arbitration Schemes • Fixed Priority (Centralized or Distributed Arbitration) • Some devices have higher priority than others. • This priority is fixed.