1 / 26

Chapter 8

Chapter 8. Interfacing Processors and Peripherals. Performance Analysis of Synchronous vs. Asynchronous. Compare the maximum bandwidth for a synchronous and an asynchronous bus: synchronous bus: clock cycle=50ns, each bus transmission takes 1 clock cycle. asynchronous bus: 40ns per handshake

makoto
Download Presentation

Chapter 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8 Interfacing Processors and Peripherals

  2. Performance Analysis of Synchronous vs. Asynchronous • Compare the maximum bandwidth for a synchronous and an asynchronous bus: • synchronous bus: clock cycle=50ns, each bus transmission takes 1 clock cycle. • asynchronous bus: 40ns per handshake • bus width = 32 bits • Find the bandwidth for each bus when performing one-word reads from a 200ns memory.

  3. Synchronous Bus • Send the address to memory: 50ns • Read the memory: 200ns • Send the data to the device: 50ns • Total time= 300ns • Bandwidth = 4bytes/300ns = 13.3 MB/sec

  4. Asynchronous Bus • Step 1: 40ns • Step 2,3,4: max(3x40ns, 200ns) = 200ns(steps 2,3,4 can be overlapped with memory access) • Step 5,6,7 3x40ns =120ns • Total=360ns • Bandwidth = 4bytes/360ns = 11.1 MB/sec

  5. Performance Analysis of Two Bus Schemes • Given a system with • a memory and bus system supporting block access of 4 to 16 words • a 64-bit synchronous bus clocked at 200MHz, with each 64-bit transfer taking 1 clock cycle, and 1 clock cycle to send an address to memory • two clock cycles needed between each bus operation • memory access for first 4 words takes 200ns, each additional set of 4 words requires 20ns

  6. Question • Find the sustained bandwidth and latency for a read of 256 words for transfers using 4-word blocks and 16-word blocks. • Find the effective number of bus transactions for each case.

  7. 4-Word Block Transfer • 1 clock cycle to send address to memory • 200ns/(5ns/cycle) = 40 cycles to read memory • 2 cycles to send data from memory • 2 idle cycles • Total = 45 cycles • 256 words requires 45x64= 2880 cycles

  8. 4-Word Block Transfer • Latency = 2880 cycles x 5ns/cycle = 14400 ns • Number of bus transactions = 64 x 1s/14400ns = 4.44M transactions/s • Bandwidth = (256x4 bytes)x 1/14400ns = 71.11 MB/s

  9. 16-Word Block Transfer • 1 clock cycle to send address to memory • 40 cycles to read first 4 words from memory • 2 cycles to send data, during which the read of the next 4 words is started. • 2 idle cycles between transfers, during which the read of the next block is completed. • Need to repeat the last two steps 3 times to read a total of 16 words.

  10. 16-Word Block Transfer • Total cycles required = 1 + 40 + 4x(2+2) =57 cycles • 256/16=16 transactions are required • Total number of cycles required for 256 word = 16x57 = 912 cycles, latency = 4560 ns • Number of bus transactions = 16 x 1s/4560ns = 3.51M transactions/s • Bandwidth = (256x4 bytes)x 1/4560ns = 224.56 MB/

  11. Bus Standards • PCI ( a general purpose backplane bus) • SCSI (Small Computer System Interface) • IEEE 1394 (Firewire) • USB 2.0

  12. Interfacing I/O Devices • How is a user I/O request transformed into a device command and communicated to the device? • How is data actually transferred to or from a memory location? • What is the role of the operating system?

  13. Role of the OS • The OS plays a major role in handling I/O, in that: • I/O system is shared by multiple programs using the processor • I/O system often use interrupts (cause transfer to supervisor mode) • low-level control of I/O is complex

  14. Communications between OS and I/O Devices • The OS must be able to give commands to I/O. • The I/O must be able to notify the OS when operation is completed or error has occurred. • Data must be transferred between memory and an I/O device.

  15. Giving Commands to I/O • To give a command, the processor must be able to address the device and to supply command words: • memory-mapped I/O: portions of the address space is assigned to I/O devices • special I/O: dedicated I/O instructions in the processor.

  16. Communicating with the Processor • Polling: processor periodically checks the status of I/O. • Overhead of polling in an I/O system • Example 1: mouse • Example 2: floppy disk • Example 3: hard disk

  17. Mouse • Assume the number of clock cycles for a polling operation, including transferring to the polling routine, accessing the device, and restarting the user program, is 400, with a 500 MHz clock. • The mouse must be polled 30 times a second to ensure that no user movement is missed. • Fraction of CPU time = 30x400/(500x10^6) = 0.002%

  18. Floppy Disk • The floppy disk transfers data to the processor in 16-bit units and has a data rate of 50KB/s. • Polling rate = (50KB/s)/(2 Bytes/polling)= 25K polling/sec • Fraction of CPU time = 25Kx400/(500x10^6) = 2%

  19. Hard Disk • Transfer in 4-word blocks • transfer rate: 4MB/s • Polling rate = (4MB/s)/(4x4 Bytes/polling)= 250K polling/sec • Fraction of CPU time = 250Kx400/(500x10^6) = 20%

  20. Overhead of Polling • Can do the polling only when the device is active, thus reducing the overhead. • However, the overhead is still significant, resulting in another design called interrupt-driven I/O.

  21. Overhead of Interrupt-Driven I/O • Assume the overhead for each transfer, including the interrupt, is 500 cycles. • Cycles per second for disk = 250Kx500= 125x10^6 cycles • Fraction of processor consumed = 125x10^6/(500x10^6) = 25% • Assuming disk is transferring data 5% of the time, fraction of CPU on average = 25%x5%=1.25%

  22. Direct Memory Access(DMA) • If disk is transferring data most of the time, the overhead for interrupt-driven I/O is still high. • For high-bandwidth device, let the device controller transfer data directly to or from the memory without involving the processor, known as direct memory access. • Interrupt is used to signal the completion of I/O transfer or error.

  23. Overhead of I/O Using DMA • Assume initial setup of DMA transfer takes 1000 cycles, handling of interrupt at DMA completion takes 500 cycles, average transfer from disk is 8KB • Each DMA transfer takes 8KB/(4MB/s) = 2x10^-3s • If the disk is constantly transferring data, it requires: (1000+500)/(2x10^-3) = 750x10^3 cycles • Fraction of CPU time= 750x10^3/(500x10^6) = 0.15%

  24. I/O System Design • Latency constraints: ensuring the latency to complete and I/O operation is bounded. • Bandwidth constraints • Performance Analysis techniques: — queuing theory — simulation — analysis

  25. I/O System Design- Example • CPU: 300 MIPS, average 5000 instructions in the OS per I/O operation • backplane bus transfer rate: 100 MB/s • SCSI-2 controller with transfer rate = 20 MB/s, accommodating up to 7 disks • Disk bandwidth = 5MB/s, seek+rotational latency=10ms • Workload: 64-KB reads, user program need 100000 instructions per I/O

  26. Example • Find • the maximum sustainable I/O rate • the number of disks and SCSI controller required.

More Related