1 / 45

14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 12 Buses and I/O system

14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 12 Buses and I/O system. [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]. Head’s Up. This week’s material Buses: Connecting I/O devices Reading assignment – PH 8.4

feleti
Download Presentation

14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 12 Buses and I/O system

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 14:332:331Computer Architecture and Assembly LanguageFall 2003Week 12Buses and I/O system [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

  2. Head’s Up • This week’s material • Buses: Connecting I/O devices • Reading assignment – PH 8.4 • Memory hierarchies • Reading assignment – PH 7.1 and B.5 • Reminders • Next week’s material • Basics of caches • Reading assignment – PH 7.2

  3. Review: Major Components of a Computer Processor Devices Control Output Memory Datapath Input Cache Main Memory Secondary Memory (Disk)

  4. Input and Output Devices • I/O devices are incredibly diverse wrt • Behavior • Partner • Data rate

  5. Magnetic Disk • Purpose • Long term, nonvolatile storage • Lowest level in the memory hierarchy • slow, large, inexpensive • General structure • A rotating platter coated with a magnetic surface • Use a moveable read/write head to access the disk • Advantages of hard disks over floppy disks • Platters are more rigid (metal or glass) so they can be larger • Higher density because it can be controlled more precisely • Higher data rate because it spins faster • Can incorporate more than one platter

  6. Organization of a Magnetic Disk • Typical numbers (depending on the disk size) • 1 to 15 (2 surface) platters per disk with 1” to 8” diameter • 1,000 to 5,000 tracks per surface • 63 to 256 sectors per track • the smallest unit that can be read/written (typically 512 to 1,024 B) • Traditionally all tracks have the same number of sectors • Newer disks with smart controllers can record more sectors on the outer tracks (constant bit density) Sector Platters Track

  7. Track Sector Cylinder Platter Head Magnetic Disk Characteristic • Cylinder: all the tracks under the heads at a given point on all surfaces • Read/write data is a three-stage process: • Seek time: position the arm over the proper track (6 to 14 ms avg.) • due to locality of disk references the actual average seek time may be only 25% to 33% of the advertised number • Rotational latency: wait for the desired sectorto rotate under the read/write head (½ of 1/RPM) • Transfer time: transfer a block of bits (sector)under the read-write head (2 to 20 MB/sec typical) • Controller time: the overhead the disk controller imposes in performing an disk I/O access (typically < 2 ms)

  8. Magnetic Disk Examples

  9. bus I/O System Interconnect Issues • A bus is a shared communication link (a set of wires used to connect multiple subsystems) • Performance • Expandability • Resilience in the face of failure – fault tolerance Processor Receiver Main Memory Keyboard

  10. Performance Measures • Latency (execution time, response time) is the total time from the start to finish of one instruction or action • usually used to measure processor performance • Throughput – total amount of work done in a given amount of time • aka execution bandwidth • the number of operations performed per second • Bandwidth – amount of information communicated across an interconnect (e.g., a bus) per unit time • the bit width of the operation * rate of the operation • usually used to measure I/O performance

  11. I/O System Expandability • Usually have more than one I/O device in the system • each I/O device is controlled by an I/O Controller interrupt signals Processor Cache Memory Memory - I/O Bus I/O Controller I/O Controller I/O Controller Main Memory Terminal Disk Disk Network

  12. Bus Characteristics • Control lines • Signal requests and acknowledgments • Indicate what type of information is on the data lines • Data lines • Data, complex commands, and addresses • Bus transaction consists of • Sending the address • Receiving (or sending) the data Control Lines Data Lines

  13. Step 1: Processor sends read request and read address to memory Control Main Memory Processor Data Step 2: Memory accesses data Control Main Memory Processor Data Step 3: Memory transfers data to disk Control Main Memory Processor Data Output (Read) Bus Transaction • Defined by what they do to memory • read = output: transfers data from memory (read) to I/O device (write)

  14. Step 1: Processor sends write request and write address to memory Control Main Memory Processor Data Step 2: Disk transfers data to memory Control Main Memory Processor Data Input (Write) Bus Transaction • Defined by what they do to memory • write = input: transfers data from I/O device (read) to memory (write)

  15. Advantages and Disadvantages of Buses • Advantages • Versatility: • New devices can be added easily • Peripherals can be moved between computer systems that use the same bus standard • Low Cost: • A single set of wires is shared in multiple ways • Disadvantages • It creates a communication bottleneck • The bus bandwidth limits the maximum I/O throughput • The maximum bus speed is largely limited by • The length of the bus • The number of devices on the bus • It needs to support a range of devices with widely varying latencies and data transfer rates

  16. Types of Buses • Processor-Memory Bus (proprietary) • Short and high speed • Matched to the memory system to maximize the memory-processor bandwidth • Optimized for cache block transfers • I/O Bus (industry standard, e.g., SCSI, USB, ISA, IDE) • Usually is lengthy and slower • Needs to accommodate a wide range of I/O devices • Connects to the processor-memory bus or backplane bus • Backplane Bus (industry standard, e.g., PCI) • The backplane is an interconnection structure within the chassis • Used as an intermediary bus connecting I/O busses to the processor-memory bus

  17. Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus I/O Bus I/O Bus A Two Bus System • I/O buses tap into the processor-memory bus via Bus Adaptors (that do speed matching between buses) • Processor-memory bus: mainly for processor-memory traffic • I/O busses: provide expansion slots for I/O devices Processor-Memory Bus Processor Memory

  18. Bus Adaptor Bus Adaptor I/O Bus Backplane Bus I/O Bus Bus Adaptor A Three Bus System • A small number of Backplane Buses tap into the Processor-Memory Bus • Processor-Memory Bus is used for processor memory traffic • I/O buses are connected to the Backplane Bus • Advantage: loading on the Processor-Memory Bus is greatly reduced Processor-Memory Bus Processor Memory

  19. I/O System Example (Apple Mac 7200) • Typical of midrange to high-end desktop system in 1997 Processor Processor-Memory Bus Cache Memory Audio I/O Serial ports PCI Interface/ Memory Controller Main Memory I/O Controller I/O Controller PCI CDRom I/O Controller I/O Controller SCSI bus Disk Graphic Terminal Network Tape

  20. Example: Pentium System Organization Processor-Memory Bus Memory controller (“Northbridge”) PCI Bus I/O Busses http://developer.intel.com/design/chipsets/850/animate.htm?iid=PCG+devside&

  21. Control: Master initiates requests Bus Master Bus Slave Data can go either way A Bus Transaction • A bus transaction includes three parts: • Gaining access to the bus - arbitration • Issuing the command (and address) - request • Transferring the data - action • Gaining access to the bus • How is the bus reserved by a devices that wishes to use it? • Chaos is avoided by a master-slave arrangement • The bus master initiates and controls all bus requests • In the simplest system: • The processor is the only bus master • Major drawback - the processor must be involved in every bus transaction

  22. Step 1: Disk wants to use the bus so it generates a bus request to processor Control Memory Processor Data Step 2: Processor responds and generates appropriate control signals Control Memory Processor Data Step 3: Processor gives slave (disk) permission to use the bus Control Memory Processor Data Single Master Bus Transaction • All bus requests are controlled by the processor • it initiates the bus cycle on behalf of the requesting device

  23. Multiple Potential Bus Masters: Arbitration • Bus arbitration scheme: • A bus master wanting to use the bus asserts the bus request • A bus master cannot use the bus until its request is granted • A bus master must release the bus after its use • Bus arbitration schemes usually try to balance two factors: • Bus priority - the highest priority device should be serviced first • Fairness - Even the lowest priority device should never be completely locked out from using the bus • Bus arbitration schemes can be divided into four broad classes • Daisy chain arbitration: all devices share 1 request line • Centralized, parallel arbitration: multiple request and grant lines • Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus • Distributed arbitration by collision detection: Ethernet uses this

  24. Grant1 Req Grant2 Req Bus Arbiter GrantN Req Centralized Parallel Arbitration • Used in essentially all backplane and high-speed I/O busses Device 1 Device 2 Device N Control Data

  25. Synchronous and Asynchronous Buses • Synchronous Bus • Includes a clock in the control lines • A fixed protocol for communication that is relative to the clock • Advantage: involves very little logic and can run very fast • Disadvantages: • Every device on the bus must run at the same clock rate • To avoid clock skew, they cannot be long if they are fast • Asynchronous Bus • It is not clocked, so requires handshaking protocol (req, ack) • Implemented with additional control lines • Advantages: • Can accommodate a wide range of devices • Can be lengthened without worrying about clock skew or synchronization problems • Disadvantage: slow(er)

  26. ReadReq 1 2 addr data Data 3 4 Ack 6 5 7 DataRdy Asynchronous Handshaking Protocol • Output (read) data from memory to an I/O device. • Memory sees ReadReq, reads addr from data lines, and raises Ack • I/O device sees Ack and releases the ReadReq and data lines • Memory sees ReadReq go low and drops Ack • When memory has data ready, it places it on data lines and raises DataRdy • I/O device sees DataRdy, reads the data from data lines, and raises Ack • Memory sees Ack, releases the data lines, and drops DataRdy • I/O device sees DataRdy go low and drops Ack I/O device signals a request by raising ReadReq and putting the addr on the data lines

  27. Key Characteristics of Two Bus Standards

  28. Review: Major Components of a Computer Processor Devices Control Input Memory Datapath Output

  29. A Typical Memory Hierarchy • By taking advantage of the principle of locality: • Present the user with as much memory as is available in the cheapest technology. • Provide access at the speed offered by the fastest technology. On-Chip Components Control eDRAM Secondary Memory (Disk) Instr Cache Second Level Cache (SRAM) ITLB Main Memory (DRAM) Datapath Data Cache RegFile DTLB Speed (ns): .1’s 1’s 10’s 100’s 1,000’s Size (bytes): 100’s K’s 10K’s M’s T’s Cost: highest lowest

  30. Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM 4-8 bytes (word) 8-32 bytes (block) 1 block 1,023+ bytes (disk sector = page) Characteristics of the Memory Hierarchy Processor Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory (Relative) size of the memory at each level

  31. Memory Hierarchy Technologies • Random Access • “Random” is good: access time is the same for all locations • DRAM: Dynamic Random Access Memory • High density (1 transistor cells), low power, cheap, slow • Dynamic: need to be “refreshed” regularly (~ every 8 ms) • SRAM: Static Random Access Memory • Low density (6 transistor cells), high power, expensive, fast • Static: content will last “forever” (until power turned off) • Size: DRAM/SRAM ­ 4 to 8 • Cost/Cycle time: SRAM/DRAM ­ 8 to 16 • “Non-so-random” Access Technology • Access time varies from location to location and from time to time (e.g., Disk, CDROM)

  32. bit (data) lines Each intersection represents a 6-T SRAM cell word (row) select Classical SRAM Organization (~Square) r o w d e c o d e r RAM Cell Array Column Selector & I/O Circuits column address row address One memory row holds a block of data, so the column address selects the requested word from that block data word

  33. RAM Cell Array Classical DRAM Organization (~Square Planes) bit (data) lines The column address selects the requested bit from the row in each plane . . . r o w d e c o d e r Each intersection represents a 1-T DRAM cell word (row) select column address Column Selector & I/O Circuits row address . . . data bit data bit data bit data word

  34. RAM Memory Definitions • Caches use SRAM for speed • Main Memory is DRAM for density • Addresses divided into 2 halves (row and column) • RASor Row Access Strobe triggering row decoder • CAS or Column Access Strobe triggering column selector • Performance of Main Memory DRAMs • Latency: Time to access one word • Access Time: time between request and when word arrives • Cycle Time: time between requests • Usually cycle time > access time • Bandwidth: How much data can be supplied per unit time • width of the data channel * the rate at which it can be used

  35. N cols RAS Classical DRAM Operation Column Address • DRAM Organization: • N rows x N column x M-bit • Read or Write M-bit at a time • Each M-bit access requiresa RAS / CAS cycle DRAM Row Address N rows M bits M-bit Output Cycle Time 1st M-bit Access 2nd M-bit Access CAS Row Address Col Address Row Address Col Address

  36. Ways to Improve DRAM Performance • Memory interleaving • Fast Page Mode DRAMs – FPM DRAMs • www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.htm • Extended Data Out DRAMs – EDO DRAMs • www.chips.ibm.com/products/memory/88H2011/88H2011.pdf • Synchronous DRAMS – SDRAMS • www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D.htm • Rambus DRAMS • www.rambus.com/developer/quickfind_documents.html • www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669B.htm • Double Data Rate DRAMs – DDR DRAMS • www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323HA.htm • . . .

  37. Memory Bank 0 Memory Bank 1 CPU Memory Bank 2 Memory Bank 3 Access Bank 1 Access Bank 0 Access Bank 2 Access Bank 3 We can Access Bank 0 again Increasing Bandwidth - Interleaving Access pattern without Interleaving: Cycle Time CPU Memory Access Time D1 available Start Access for D1 D2 available Start Access for D2 Access pattern with 4-way Interleaving:

  38. Problems with Interleaving • How many banks? • Ideally, the number of banks  number of clocks we have to wait to access the next word in the bank • Only works for sequential accesses (i.e., first word requested in first bank, second word requested in second bank, etc.) • Increasing DRAM sizes => fewer chips => harder to have banks • Growth bits/chip DRAM : 50%-60%/yr • Only can use for very large memory systems (e.g., those encountered in supercomputer systems)

  39. N x M “SRAM” M bits 1st M-bit Access 2nd M-bit 3rd M-bit 4th M-bit RAS CAS Row Address Col Address Col Address Col Address Col Address Fast Page Mode DRAM Operation Column Address • Fast Page Mode DRAM • N x M “SRAM” to save a row N cols DRAM Row Address • After a row is read into the SRAM “register” • Only CAS is needed to access other M-bit blocks on that row • RAS remains asserted while CAS is toggled N rows M-bit Output

  40. µProc 60%/year (2X/1.5yr) DRAM 9%/year (2X/10yrs) Why Care About the Memory Hierarchy? Processor-DRAM Memory Gap 1000 CPU “Moore’s Law” Processor-Memory Performance Gap:(grows 50% / year) 100 Performance 10 DRAM 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time

  41. Probability of reference 0 2n - 1 Address Space Memory Hierarchy: Goals • Fact: Large memories are slow, fast memories are small • How do we create a memory that gives the illusion of being large, cheap and fast (most of the time)? by taking advantage of • The Principle of Locality: Programs access a relatively small portion of the address space at any instant of time.

  42. Memory Hierarchy: Why Does it Work? • Temporal Locality (Locality in Time): => Keep most recently accessed data items closer to the processor • Spatial Locality (Locality in Space): => Move blocks consists of contiguous words to the upper levels Lower Level Memory Upper Level Memory To Processor Blk X From Processor Blk Y

  43. Lower Level Memory Upper Level Memory To Processor Blk X From Processor Blk Y Memory Hierarchy: Terminology • Hit: data appears in some block in the upper level (Block X) • Hit Rate: the fraction of memory accesses found in the upper level • Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss • Miss: data needs to be retrieve from a block in the lower level (Block Y) • Miss Rate = 1 - (Hit Rate) • Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor • Hit Time << Miss Penalty

  44. How is the Hierarchy Managed? • registers <-> memory • by compiler (programmer?) • cache <-> main memory • by the hardware • main memory <-> disks • by the hardware and operating system (virtual memory) • by the programmer (files)

  45. Summary • DRAM is slow but cheap and dense • Good choice for presenting the user with a BIG memory system • SRAM is fast but expensive and not very dense • Good choice for providing the user FAST access time • Two different types of locality • Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. • Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon. • By taking advantage of the principle of locality: • Present the user with as much memory as is available in the cheapest technology. • Provide access at the speed offered by the fastest technology.

More Related