230 likes | 334 Views
CS 230: Computer Organization and Assembly Language. Aviral Shrivastava. Department of Computer Science and Engineering School of Computing and Informatics Arizona State University. Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB. Announcements.
E N D
CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics Arizona State University Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB
Announcements • Alternate Project • Due Today • Real Examples • Finals • Tuesday, Dec 08, 2009 • Please come on time (You’ll need all the time) • Open book, notes, and internet • No communication with any other human
Time, Time, Time • Making a Single Cycle Implementation is very easy • Difficulty and excitement is in making it fast • Two fundamental methods to make Computers fast • Pipelining • Caches Write Data Instruction Memory Address Read Data Register File Reg Addr Data Memory Read Data PC Address Instruction ALU Reg Addr Read Data Write Data Reg Addr
Effect of high memory Latency • Single Cycle Implementation • Cycle time becomes very large • Operation that do not need memory also slow down Write Data Instruction Memory Address Read Data Register File Reg Addr Data Memory Read Data PC Address Instruction ALU Reg Addr Read Data Write Data Reg Addr
Effect of high memory Latency IR A ALUout B MDR • Multi-cycle Implementation • Cycle time becomes long • But • Can make memory access multi-cycle • Avoid penalty to instructions that do not use memory Address Memory Read Addr 1 PC Read Data 1 Register File Read Addr 2 Read Data (Instr. or Data) ALU Write Addr Write Data Read Data 2 Write Data
Effects of high memory latency ALU IM Reg DM Reg • Pipelined Implementation • Cycle time becomes long • But • Can make memory access multi-cycle • Avoid penalty to instructions that do not use memory • Can overlap execution of other instructions with a memory operation
Kinds of Memory faster Flipflops CPU Registers 100s Bytes <10s ns SRAM K Bytes 10-20 ns $.00003/bit SRAM DRAM M Bytes 50ns-100ns $.00001/bit DRAM Disk G Bytes ms 10-6 cents Disk Tape infinite sec-min Tape larger
Memories • CPU Registers, Latches • Flip flops: very fast, but very small • SRAM – Static RAM • Very fast, Low Power, but small • Data is persistent, until there is power • DRAM – Dynamic RAM • Very dense • Like a vanishing ink – data disappears with time • Need to refresh the contents
Flip Flops • Fastest form of memory • Store data using combinational logic components only • SR, JK, T, D- flip flops CSE 420: Computer Architecture I
SRAM Cell b b’ Computer Scientist View Electrical Engineering View
A 4-bit SRAM Wr Driver Wr Driver Wr Driver Wr Driver - + - + - + - + Din 3 Din 2 Din 1 Din 0 WrEn Precharge Word SRAM Cell SRAM Cell SRAM Cell SRAM Cell
A 16X4 Static RAM (SRAM) Wr Driver Wr Driver Wr Driver Wr Driver - + - + - + - + A0 A1 A2 A3 Sense Amp Sense Amp Sense Amp Sense Amp Din 3 Din 2 Din 1 Din 0 WrEn Precharge Word 0 SRAM Cell SRAM Cell SRAM Cell SRAM Cell Address Decoder Word 1 SRAM Cell SRAM Cell SRAM Cell SRAM Cell : : : : Word 15 SRAM Cell SRAM Cell SRAM Cell SRAM Cell - + - + - + - + Dout 3 Dout 2 Dout 1 Dout 0
Dynamic RAM (DRAM) • Value is stored in the capacitor • Discharges with time • Needs to be refreshed regularly • Dummy read will recharge the capacitor • Very high density • Newest technology is first tried on DRAMs • Intel became popular because of DRAM • Biggest vendor of DRAM
Why Not Only DRAM? • Not large enough for some things • Backed up by storage (disk) • Virtual memory, paging, etc. • Will get back to this • Not fast enough for processor accesses • Takes hundreds of cycles to return data • OK in very regular applications • Can use SW pipelining, vectors • Not OK in most other applications
Is there a problem with DRAM? Processor-Memory Performance Gap:grows 50% / year Processor-DRAM Memory Gap (latency) µProc 60%/yr. (2X/1.5yr) 1000 CPU “Moore’s Law” 100 Performance 10 DRAM 9%/yr. (2X/10yrs) DRAM 1 1988 1980 1981 1982 1983 1984 1985 1986 1987 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time
Memory Hierarchy Analogy: Library (1/2) • You’re writing a term paper (Anthropology) at a table in Hayden • Hayden Library is equivalent to disk • essentially limitless capacity • very slow to retrieve a book • Table is memory • smaller capacity: means you must return book when table fills up • easier and faster to find a book there once you’ve already retrieved it
Memory Hierarchy Analogy: Library (2/2) • Open books on table are cache • smaller capacity: can have very few open books fit on table; again, when table fills up, you must close a book • much, much faster to retrieve data • Illusion created: whole library open on the tabletop • Keep as many recently used books open on table as possible since likely to use again • Also keep as many books on table as possible, since faster than going to library
Memory Hierarchy: Goals • Fact: Large memories are slow, fast memories are small • How do we create a memory that gives the illusion of being large, cheap and fast (most of the time)?
Memory Hierarchy: Insights • Temporal Locality (Locality in Time): => Keep most recently accessed data items closer to the processor • Spatial Locality (Locality in Space): => Move blocks consists of contiguous words to the upper levels Lower Level Memory Upper Level Memory To Processor Blk X From Processor Blk Y
Memory Hierarchy: Solution Our current focus Capacity Access Time Cost Upper Level Staging Xfer Unit faster CPU Registers 100s Bytes <10s ns Registers prog./compiler 1-8 bytes Instr. Operands Cache K Bytes 10-100 ns 1-0.1 cents/bit Cache cache cntl 8-128 bytes Blocks Main Memory M Bytes 200ns- 500ns $.0001-.00001 cents /bit Memory OS 4K-16K bytes Pages Disk G Bytes, 10 ms (10,000,000 ns) 10 - 10 cents/bit Disk -5 -6 user/operator Mbytes Files Larger Tape infinite sec-min 10 Tape Lower Level -8
Memory Hierarchy: Terminology Lower Level Memory Upper Level Memory To Processor Blk X From Processor Blk Y • Hit: data appears in some block in the upper level (Block X) • Hit Rate: fraction of memory accesses found in the upper level • Hit Time: Time to access the upper level which consists of • RAM access time + Time to determine hit/miss • Miss: data needs to be retrieve from a block in the lower level (Block Y) • Miss Rate = 1 - (Hit Rate) • Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor • Hit Time << Miss Penalty
Memory Hierarchy: Show me numbers • Consider application • 30% instructions are load/stores • Suppose memory latency = 100 cycles • Time to execute 100 instructions • = 70*1 + 30*100 = 3070 cycles • Add a cache with latency 2 cycle • Suppose hit rate is 90% • Time to execute 100 instructions • = 70*1 + 27*2 + 3*100 = 70+54+300 = 424 cycles
Yoda says… You will find only what you bring in