ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6 http://www.ecs.umass.edu/ece/ece232/

Processor Cache interrupts Memory - I/O Bus I/O Controller I/O Controller I/O Controller Main Memory Graphics Disk Disk Network Anatomy: 5 components of any Computer Keyboard, Mouse Processor Devices Memory Control Input Disk Datapath Output Display, Printer

Handling IO • Users like to connect devices to their computers • Keyboard, mouse, printer… • External devices may require attention from processor at unpredictable times • CPU doesn’t know when you’re about to hit a key • IO devices can be very fast or very slow • Need to have a flexible way to control all devices

I/O Device Examples and Speeds • I/O Speed: bytes transferred per second(from mouse to display: million-to-1) Device Behavior Partner Data Rate(Mbit/sec) Keyboard Input Human 0.0001 Mouse Input Human 0.0038 Laser Printer Output Human 3.2000 Magnetic Disk Storage Machine 240-2560 Modem I or O Machine 0.016-0.064 Network-LAN I or O Machine 100-1000 Graphics Display Output Human 800-8000 See Fig. 6.2 Text

P e n t i u m 4 p r o c e s s o r S y s t e m b u s ( 8 0 0 M H z , 6 0 4 G B / s e c ) D D R 4 0 0 A G P 8 X M e m o r y ( 3 . 2 G B / s e c ) ( 2 . 1 G B / s e c ) G r a p h i c s c o n t r o l l e r M a i n o u t p u t h u b D D R 4 0 0 C S A m e m o r y ( n o r t h b r i d g e ) ( 3 . 2 G B / s e c ) ( 0 . 2 6 6 G B / s e c ) D I M M s 1 G b i t E t h e r n e t 8 2 8 7 5 P ( 2 6 6 M B / s e c ) S e r i a l A T A P a r a l l e l A T A ( 1 5 0 M B / s e c ) ( 1 0 0 M B / s e c ) C D / D V D D i s k S e r i a l A T A P a r a l l e l A T A ( 1 5 0 M B / s e c ) ( 1 0 0 M B / s e c ) T a p e D i s k I / O c o n t r o l l e r A C / 9 7 h u b ( 1 M B / s e c ) S t e r e o ( s o u t h b r i d g e ) ( 2 0 M B / s e c ) ( s u r r o u n d - 8 2 8 0 1 E B 1 0 / 1 0 0 M b i t E t h e r n e t U S B 2 . 0 s o u n d ) ( 6 0 M B / s e c ) P C I b u s . . . ( 1 3 2 M B / s e c ) Hardware Solution (875 Chipset)

Inner Track Outer Track Arm Head Sector Actuator Platter Disk Device Terminology • Several platters, with information recorded magnetically on both surfaces (usually) • Bits recorded in tracks, which in turn are divided into sectors (e.g., 512 Bytes) • Actuator moves head (end of arm, 1/surface) over track (“seek”), select surface, wait for sector rotate under head, then read or write • “Cylinder”: all tracks under heads

Disk Device Performance • Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead • Seek Time - depends on no. tracks arm moves, seek speed • Average no. tracks arm moves? • Sum all possible seek distances from all possible tracks / total # • Assumes average seek distance is random • Disk industry standard benchmark • Rotation Time - depends on rotation speed, how far sector is from head • 1/2 time of a rotation • Example: 7200 Revolutions Per Minute  120 Rev/sec • 1 revolution = 1/120 sec  8.33 milliseconds • 1/2 rotation (revolution)  4.16 ms • Transfer Time - depends on data rate (bandwidth) of disk (bit density), size of request

Disk Performance Model /Trends • Capacity • + 100%/year (2X/1 yr) • Transfer rate (BW) • + 40%/year (2X/2 yrs) • Rotation + Seek time • – 8%/year (1/2 in 10 yrs) • MB/$ • > 100%/yr (2X/<1.5 yr)

Disk Performance • Calculate time to read 1 sector (512B) for UltraStar 72 using advertised performance; sector is on outer track • Disk latency = average seek time + average rotational delay + transfer time + controller overhead = 5.3 ms + 0.5 * 1/(10000 RPM) + 0.5 KB / (50 MB/s) + 0.15 ms = 5.3 + 3.0 + 0.10 + 0.15 ms = 8.55 ms

address 0 0xFFFF0000 0xFFFFFFFF cmd reg. data reg. Instruction Set Architecture for I/O • Some machines have special input and output instructions • Alternative model (used by MIPS): • Input: ~ reads a sequence of bytes • Output: ~ writes a sequence of bytes • Memory also a sequence of bytes, so use loads for input, stores for output • Called “Memory Mapped Input/Output” • A portion of the address space dedicated to communication paths to Input or Output devices (no memory there) • These addresses are not regular memory, instead, they correspond to registers in I/O devices

Memory Mapped IO • Make control registers and I/O device data registers appear to be part of the system’s main memory • Reads and writes to the mapped region of the memory are translated by memory controller hardware into accesses of hardware device • Makes it easy to support variable numbers/types of devices – just map them onto different regions of memory • Accessing I/O device registers and memory can be done by accessing data structures via the device pointers • Most device drivers are now written in C/C++. Memory mapped I/O makes this feasible without any changes to the way a CPU is programmed

Processor-I/O Speed Mismatch • 1 GHz microprocessor can execute 1000 million load or store instructions per second, or 4 million KB/s data rate • I/O devices from 0.01 KB/s to 30,000 KB/s • Input: device may not be ready to send data as fast as the processor loads it • Also, might be waiting for human to act • Output: device may not be ready to accept data as fast as processor stores it • What to do?

Processor Checks Status before Acting: Polling • Path to device generally has 2 registers: • 1 register says it’s OK to read/write (I/O ready), often called Control Register • 1 register that contains data, often called Data Register • Processor reads from Control Register in loop, waiting for device to set Ready bit in Control reg to say its OK (0  1) • Processor then loads from (input) or writes to (output) data register • Load from device/Store into Data Register resets Ready bit (1  0) of Control Register

Cost of Polling? • Assume: a 1 GHz processor takes 400 clock cycles for a polling operation (call polling routine, accessing the device, and returning). Determine % of processor time for polling • Mouse: polled 30 times/sec - not to miss user movement • Hard disk: transfers data in 16-byte chunks and can transfer at 8 MB/second. No transfer can be missed • Mouse Polling Clocks/sec = 30 * 400 = 12000 clocks/sec • % Processor for polling = 12*103/1*109 = 0.0012%  Polling mouse has little impact on processor • Times Polling Disk/sec = 8 MB/s /16B = 500K polls/sec • Disk Polling Clocks/sec = 500K * 400 = 200,000,000 clocks/sec • % Processor for polling: • 2*108/1*109 = 20%  Unacceptable

What is the alternative to polling? Interrupt • Wasteful to have processor spend most of its time “spin-waiting” for I/O to be ready • Wish we could have an unplanned procedure call that would be invoked only when I/O device is ready • Solution: use exception mechanism to help I/O. Interrupt program when I/O ready, return when done with data transfer Polling is like picking up the phone every few seconds to see if you have a call. Interrupt is like letting the phone ring

I/O Interrupt • Controller sends interrupt to the processor along with additional information • which device • nature of interrupt: error, no paper, no ink,… • Processor halts execution of current program • Saves State • Processor looks up which handler to start from the interrupt information • When interrupt is handled, returns to program state and resumes

Memory  add sub and or (1) I/O interrupt user program (2) save PC (3) interrupt service addr (4)  read store ... jr interrupt service routine (5) Interrupt Driven Data Transfer

Benefit of Interrupt-Driven I/O • 500 clock cycle overhead for each transfer, including interrupt. Find the % of processor consumed if the hard disk is only active 5% of the time • If interrupt rate = polling rate • Disk Interrupts/sec = 8 MB/s /16B = 500K interrupts/sec • Disk Polling Clocks/sec = 500K * 500 = 250,000,000 clocks/sec • % Processor used during transfers: 250*106/1*107= 25% • If disk active 5%  5% * 25%  1.25% busy

Interrupts – Multiple devices • Aggregates interrupts • Prioritization(network, keyboard,..) Device 1 Device 2 Processor Advanced Priority Interrupt Controller (APIC) Device i Device n

Interrupt vs. Polling • Which is better: Interrupts or Polling? • Interrupts are better if the processor has something else to do and the time-to-response is not critical • Polling is better if the processor has to respond to an event ASAP • Polling is also used when data is expected at regular intervals such as in a modem • Modem typically connects to a “com” port • The “com” port can be polled at expected intervals

Direct Memory Access (DMA) • How to transfer large amounts of data between a Device and Memory? Waste of CPU cycles if done through CPU • Let the device controller transfer data directly to and from memory => DMA • The CPU sets up the DMA transfer by supplying the type of operation, memory address and number of bytes to be transferred • The DMA controller contacts the bus directly, provides memory address and transfers the data • Once the DMA transfer is complete, the controller interrupts the CPU to inform completion • Cycle Stealing – Bus gives priority to DMA controller thus stealing cycles from the CPU

OS control of I/O operations • Low-level control of I/O device is complex because it requires managing a set of concurrent events and because requirements for correct device control are often very detailed • I/O systems often use interrupts to communicate information about I/O operations and these can occur at a random time • The I/O system is shared by multiple programs using the processor • Would like I/O services for all user programs under safe control

ECE232: Hardware Organization and Design