580 likes | 739 Views
Spring Quarter, 2002. Final Review Final: June 10, 2001 3:00 p.m. to 6:00 p.m. Knudsen 1200B Extra office hour: Friday 6/7/02 4:30 p.m.to 7:30 p.m. Saturday 6/8/01 4:00 p.m. to 6:00 p.m. Areas for Study. What is computer architecture? Number Representation
E N D
Spring Quarter, 2002 Final Review Final: June 10, 2001 3:00 p.m. to 6:00 p.m. Knudsen 1200B Extra office hour: Friday 6/7/02 4:30 p.m.to 7:30 p.m. Saturday 6/8/01 4:00 p.m. to 6:00 p.m.
Areas for Study • What is computer architecture? • Number Representation • Floating point number representation and IEEE 754 • Floating point operations with IEEE 754 • MIPS instruction set • Able to write simple assembly code with MIPS instruction set • Understanding of procedure calls and stack management • Procedure call • Stack management • General ideas about single cycle/multi cycle data path and control unit design • Pipelined Processor • Basic concepts and data flow in pipeline • Hazards • Data Hazard • Stalling the pipe • Forwarding (including the special case of lw followed by R-type) • Control Hazard • Branch Prediction
Areas for Study • Memory Hierarchy and Virtual Memory • Concept of memory hierarchy and locality (spatial and temporal) • Performance of memory hierarchy: calculation of average access time • Cache organizations and overheads • Associativity: direct mapping, set associate, fully associate • Block size • Replacement policies • Write back vs. write through • Virtual Memory • Virtual to Physical Address Translation: Page Table, Page Frame Table • Table Look-aside Buffer (TLB) • You should know how to read/write data from a memory hierarchy with an virtual address • I/O System • I/O system architecture • I/O system design process • I/O system design parameters • I/O device interface design • Your should be able to do both system level and detailed design
Control Application ALU Mem I Reg Operating System Software Compiler Firmware Instruction Set Architecture Vdd Instr. Set Proc. I/O system I1 O1 Datapath & Control Vdd I1 O1 Digital Design I2 O2 Hardware I1 O1 Circuit Design Physical Design What is Computer Architecture? • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, and Evaluation Bottom Up view Courtesy D. Patterson
IEEE 754 Standard for Floating Point Numbers Two formats: single precision (32-bit) and double precision (64-bit). Single precision format: • Maximize precision of representation with fix number of bits • Gain 1 bit by making leading 1 of mantissa implicit. Therefore, F = 1 + significand, Value = (1)s (1 + significand) 2 E • Easy for comparing numbers • Put sign bit at MSB • Use bias instead of sign bit for exponent field Real exponent value = exponent - bias, bias = 127 for single precision Examples: IEEE 754 value Floating Point Number Value Exponent A = -12600000001 (1)s F 2 (1-127) = (1)s F 2-126 Exponent B = 127 11111110 (1)s F 2 (254-127) = (1)s F 2127 This is much easier to compare than having A = 12610 = 100000102 and B = 12710 = 011111112 • Need to take care special cases (by convention) Value = 0 E = 0 f = 0 i.e., f = significand Value = (1)s E = 255 f = 0 Value = (1)s(0.f)2-126 E = 0 f 0 Value has been denormalized sign Exponent (biased) Significand only (leading 1 is implicit)
IEEE 754 Computation Example A) 40 = (–1)0 1. 25 25 = (–1)0 1.012 2(132 – 127) = [0][10000100][101000000000000000000] B) –80 = (–1)1 1. 25 26 = (–1)1 1. 012 2(133 – 127) = [1][10000101][111101000000000000000] C) Denormalize the significand with the lower exponent and then align the exponents: 40 = (–1)0 0. 3125 27 = (–1)0 0.01012 2 (134 – 127) = [0][10000110][010100000000000000000] –80 = (–1)1 0. 6250 27 = (–1)1 0.10102 2 (134 – 127) = [1][10000110][101000000000000000000] D) Need to convert the IEEE 754 significand of –80 into 2’s complement before the subtraction: –80 = [1][10000110][101000000000000000000] [1][10000110][011000000000000000000] 40 – 80 = [0][10000110][010100000000000000000] + [1][10000110][011000000000000000000] = [0][10000110][101100000000000000000] E) Convert the result in 2’s complement into IEEE 754 = [1][10000110][010100000000000000000] F) Renormalize: [1][10000110][010100000000000000000] = [1][10000100][010000000000000000000] = (–1)1 1.012 25 Check: 40 – 80 = – 40 = (–1)1 1.25 25 = (–1)1 1.012 25
Procedure Call: An Overly Simplified Example Addr main() /* Caller */ { x = y + z; funct(arg); /* procedure call */ … } $v0 w ($2) 1 arg $a0 ($4) 2 funct addr 1 2 main addr main addr 3 1 2 3 PC 3 x w $t0 ($8) 3 y v $t1 ($9) main addr3 $ra ($31) z $t2 ($10) Addr int funct( arg ) /* Callee */ { w = arg – v; return (w); } arg 1 2 3 • But! • What if there are more than 4 arguments? • What if there are some register values need to be preserved across procedure call (e.g., if you want to preserve the value x)? • What if another procedure call happens before the current procedure is completed?
Call-Return Linkage: Stack Frames • Solution: • Save the needed information (e.g., arguments, return address) onto a stack in memory • Information needed by the called procedure are grouped into a stack frame • Many variations on stacks possible (up/down, last pushed / next ) High Mem Reference Arguments and Local Variables at Fixed (negative) Offset From FP FP ARGS (frame pointer points to 1st word of frame) Callee Save Registers (old $fp, $ra, $s0,etc) Stack Frame or Activation Record Local Variables SP (stack pointer points to last word of frame) Grows and shrinks during expression evaluation Low Mem
Performance of An Ideal Pipeline • Latency of Pipeline = Latency of a Single Task • Potential Throughput Improvement = Number of Pipeline Stages Under The Ideal Situations That All Instructions Are Independent and No Branch Instructions • Pipeline Rate is Limited by the Slowest Pipeline Stage Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Clk IFetch Reg/Dec Exec Mem WrBack 1st lw IFetch Reg/Dec Exec Mem WrBack 2nd lw IFetch Reg/Dec Exec Mem WrBack 3rd lw
Example of Detailed Pipeline Operations --/IF IF/ID ID/EX EX/MEM MEM/WB PCsrc 0 M wb wb wb u 1 Control x <31:26> m m ex ALUop Add Add Branch MemtoR MemRd RegWrite MemWr x4 4 <10:0> ALU Zero Control rs PC ALUsrc Rd Reg1 Addr rt A RdReg2 MEM/WB EX/MEM IF/ID ID/EX mdo Instruction Addr Registers ALUout zero Memory ALU Rd Data B 0 Wr Reg Data M out 0 u Wr Data 1 Memory M x u Clk PC 1 00 lw $2, 0($3) 2 04 add $4, $0, $5 3 08 sw $6, 4($3) 4 12 addi $7, $2, 100 5 16 add $8, $2, $5 6 20 add $9, $2, $4 7 24 sub $10, $4, $7 8 28 add $11, $7, $8 B 1 x Wr Data <15:0> <31:0> Ext RegDst rt rt 0 ALUout M rd rd u rd rd 1 x See MIPS Example in Class
Pipeline Hazards • Pipelining Limitations: Hazards are Situations that Prevent the Next Instruction from Executing During its Designated Cycle • Structural Hazard: Resource Conflict When Several Pipelined Instructions Need the Same Functional Unit Simultaneously • Data Hazard: An Instruction Depends on the Result of a Prior Instruction that is Still in the Pipeline • Control Hazard: Pipelining of Branches and Other Instructions that Change the PC • Solutions: • Common to all: Stall the Pipeline by Inserting “Bubbles” Until the Hazard is Resolved • Structural: Don’t share components between instructions, use special components (e.g., 2 port memory) • Data: re-ordering of instructions, forwarding • Control Hazard: Branch prediction, re-ordering of instructions
To Stall a Pipelined Data Path Don’t Change PC, Keeps Fetching Same Instruction, Sets All Control Signals in The ID/EX Pipeline Register to Benign Values (0) Each refetch creates a bubble All ctrl set to 0 All ctrl set to 0 All ctrl set to 0 All ctrl set to 0 sub r4, r1 ,r3 (I.e., do nothting) All ctrl set to 0 All ctrl set to 0 All ctrl set to 0 All ctrl set to 0 sub r4, r1 ,r3 (refetch) (I.e., do nothting) All ctrl set to 0 All ctrl set to 0 All ctrl set to 0 All ctrl set to 0 sub r4, r1 ,r3 (refetch) (I.e., do nothting) Do not update PC (execute)
Hardware to Stall The Pipeline ID/EX.MemRead Hazard Detect ID/EX.rt 0 PCWr IF/IDWr IF/ID.opcode IF/ID.rt IF/ID.rs wb wb wb Mux m m Control ex Fwd A Reg File Mux A Instr Mem Data Memory EX/MEM MEM/WB PC ALU Mux IF/ID ID/EX Mux B rd rd Mux rt rt Fwd B rd rt rt Forwarding Unit rd rs rs • Step 1: Detecting the hazard (check if lw is being executed and if the memory data is loaded to one of the operands in the next instruction) • Stall = if (ID/EX.MemRead and ((ID/EX.rt = IF/ID.rs) or (ID/EX.rt = IF/ID.rt))) • Step 2: If Stall is true • Do not fetch the next instruction by disabling the writing to PC and IF/ID registers • Disable all control signals of the current instruction
ID/EX.MemRead = 1 lw instrcution ID/EX.rt = R1 Sub IF/ID.rs = R1 lw sub Stalling The Pipeline Example: R-type after lw ID/EX.MemRead Hazard Detect ID/EX.rt 0 RegWr = 1 PCWr IF/IDWr IF/ID.op IF/ID.rt IF/ID.rs wb wb wb Mux m m Control MemRead = 1, MemWr = 0 ex Fwd A Reg File Mux A Instr Mem Data Memory EX/MEM MEM/WB PC ALU Mux IF/ID ID/EX Mux B rd rd Mux rt rt Fwd B rd rt rt Forwarding Unit rd rs rs lw r1, 0(r2) sub r4, r1 ,r3 and r6, r7,r1 or r8, r1 ,r9
Stalling The Pipeline Example: R-type after lw ID/EX.MemRead = 1 lw instrcution ID/EX.MemRead Hazard Detect ID/EX.rt = R1 ID/EX.rt 0 RegWr = 1 PCWr=0 PCWr IF/IDWr Sub IF/ID.op IF/ID.rt IF/ID.rs = R1 IF/IDWr = 0 IF/ID.rs wb wb wb lw Mux m m Control MemRead = 1, MemWr = 0 ex Fwd A sub Reg File Mux A Instr Mem Data Memory EX/MEM MEM/WB PC ALU Mux IF/ID ID/EX Mux B rd rd Mux rt rt Fwd B rd rt rt Forwarding Unit rd rs rs lw r1, 0(r2) sub r4, r1 ,r3 and r6, r7,r1 or r8, r1 ,r9
Stalling The Pipeline Example: R-type after lw ID/EX.MemRead Hazard Detect ID/EX.rt 0 RegWr = 0 RegWr = 1 PCWr IF/IDWr Sub IF/ID.op IF/ID.rt IF/ID.rs = R1 IF/ID.rs lw wb wb wb Mux m m Control MemRead = 0, MemWr = 0 ex MemRead = 1 MemWr = 0 Fwd A sub sub Reg File Mux A Instr Mem Data Memory EX/MEM MEM/WB PC ALU Mux IF/ID ID/EX Mux B rd rd Mux rt rt Fwd B rd rt rt Forwarding Unit rd rs rs Re-Fetch Not Doing Anything bubble lw r1, 0(r2) sub r4, r1 ,r3 and r6, r7,r1 or r8, r1 ,r9
Stalling The Pipeline Example: R-type after lw ID/EX.MemRead Hazard Detect ID/EX.rt 0 RegWr = 1 RegWr = 0 RegWr = 1 PCWr IF/IDWr IF/ID.op IF/ID.rt IF/ID.rs lw wb wb wb sub Mux m m Control MemRead = 0, MemWr = 0 ex sub MemRead = 0 MemWr = 0 Fwd A and Reg File Mux A Instr Mem Data Memory EX/MEM MEM/WB PC ALU Mux IF/ID ID/EX Mux B rd rd Mux rt rt Fwd B rd rt rt Forwarding Unit rd rs rs bubble lw r1, 0(r2) sub r4, r1 ,r3 and r6, r7,r1 or r8, r1 ,r9
Stalling The Pipeline Example: R-type after lw ID/EX.MemRead Hazard Detect ID/EX.rt 0 RegWr = 1 RegWr = 1 RegWr = 0 PCWr IF/IDWr IF/ID.op IF/ID.rt IF/ID.rs wb wb sub wb and Mux m m Control MemRead = 0, MemWr = 0 ex MemRead = 0 MemWr = 0 Fwd A sub or Reg File Mux A Instr Mem Data Memory lw data EX/MEM MEM/WB PC ALU Mux IF/ID ID/EX Mux B rd rd Mux rt rt Fwd B rd rt rt Forwarding Unit rd rs rs bubble lw r1, 0(r2) sub r4, r1 ,r3 and r6, r7,r1 or r8, r1 ,r9
Stalling The Pipeline Example: R-type after lw ID/EX.MemRead Hazard Detect ID/EX.rt 0 RegWr = 1 RegWr = 1 RegWr = 1 PCWr IF/IDWr IF/ID.op IF/ID.rt IF/ID.rs wb wb sub wb and or Mux m m Control MemRead = 0, MemWr = 0 ex MemRead = 0 MemWr = 0 Fwd A Reg File Mux A Instr Mem Data Memory lw data EX/MEM MEM/WB PC ALU Mux IF/ID ID/EX Mux B rd rd Mux rt rt Fwd B rd rt rt Forwarding Unit rd rs rs lw r1, 0(r2) The bubble has not changed any state of the pipeline sub r4, r1 ,r3 and r6, r7,r1 or r8, r1 ,r9
Stalling The Pipeline Example: R-type after lw ID/EX.MemRead Hazard Detect ID/EX.rt 0 RegWr = 1 RegWr = 1 PCWr IF/IDWr IF/ID.op IF/ID.rt IF/ID.rs wb wb wb and or Mux m m Control ex MemRead = 0 MemWr = 0 Fwd A Reg File Mux A Instr Mem Data Memory sub data lw data EX/MEM MEM/WB PC ALU Mux IF/ID ID/EX Mux B rd rd Mux rt rt Fwd B rd rt rt Forwarding Unit rd rs rs lw r1, 0(r2) The bubble has not changed any state of the pipeline sub r4, r1 ,r3 and r6, r7,r1 or r8, r1 ,r9
Control wb wb wb m m ex Fwd A 0 Reg File Mux A 1 Data Memory 2 EX/MEM MEM/WB ALU 0 Mux ID/EX Mux B 1 2 rd Mux rt Fwd B rd Forwarding Unit rd rs Data Hazard Solution: Forwarding • Fwd A = 1 (i.e., Type 1a) if (EX/MEM.RegWrite and (EX/MEM.RegRd 0) and (EX/MEM.RegRd = ID/EX.RegRs)) Fwd A = 2 (i.e.,Type 2a) if (MEM/WB.RegWrite and (MEM/WB.RegRd 0) and (MEM/WB.RegRd = ID/EX.RegRs)) • Fwd B = 1 (i.e., Type 1b) if (EX/MEM.RegWrite and (EX/MEM.RegRd 0) and (EX/MEM.RegRd = ID/EX.RegRt)) Fwd B = 2 (i.e.,Type sb) if (MEM/WB.RegWrite and (MEM/WB.RegRd 0) and (MEM/WB.RegRd = ID/EX.RegRt)) Logic Equation for the Control Outputs of the Forwarding Unit
01 A=R[rs] A=R[rs] A=R[rs] A+B A • B A - B A+B A-B sub add add B=R[rt] B=R[rt] B=R[rt] add and sub A+B r1 r6 r4 r3 r1 r3 r4 10 r1 r1 r2 r7 r1 Type 1a Hazard Type 2b Hazard Forwarding Example add r1 ,r2, r3 sub r4, r1 ,r3 and r6, r7,r1 Control wb wb wb m m ex Fwd A Reg File Mux A Data Memory EX/MEM MEM/WB ALU Mux ID/EX Mux B rd Mux rd rd rt Fwd B Forwarding Unit rs
Valid output for lw A=R[rs] A=R[rs] Mem[addr] Addr A+ B addr lw lw B=R[rt] lw add add r1 r4 r3 r3 r1 r2 r1 Type 1a Hazard, but cannot forward EX/MEM output. It is not valid output of lw One Case Forwarding Can’t Avoid Stalling Problem: lw followed by R-type – the lw instruction is still reading memory when the sub instruction needs the data for EX. Need to stall 1 cycle (see previous example) add r1 ,r2, r3 sub r4, r1 ,r3 and r6, r7,r1 Control wb wb wb m m ex Fwd A Reg File Mux A Data Memory EX/MEM MEM/WB ALU Mux ID/EX Mux B rd Mux rd rd rt Fwd B Forwarding Unit rs
PC=12 Assume branch not taken PC=16 Assume branch not taken PC=20 Assume branch not taken PC=24 Result of comparison not to branch or $15,$7,$3 PC=28 Prediction is correct, branching does not cause any penalty Control Hazard Solution: Branch Prediction (e.g., Predict Branch Not Taken)
PC=12 Assume branch not taken PC=16 Assume branch not taken PC=20 Assume branch not taken PC=24 Result of comparison branch taken PC=36 Branch target Prediction is incorrect, need to flush pipe, penalty = without branch prediction (3 cycles) Penalty of Wrong Prediction
To Reduce Branch Panelty Move Address Calculation Hardware Forward 3rd clock delay 1st clock delay 2nd clock delay
To Reduce Branch Panelty Move Address Calculation Hardware Forward 1st clock delay
Memory Hierarchy • Motivations: • Large Memories (DRAM) are Slow and Lower Cost • Small Memories (SRAM) are Fast but Higher Cost • Goal: Present the User with a Large Memory at the Lowest Cost while Providing Access at a Speed Comparable to the Fastest Technology • Reduce the Required Bandwidth of the Large Memory Memory Hierarchy Large Memory (slow) Fast Memory (small)
Registers Cache Memory Disk Tape Typical Memory Hierarchy Performance: CPU Registers: in 100’s of Bytes <10’s of ns Cache: in K Bytes 10-100 ns $0.01 - 0.001/bit Main Memory: in M Bytes 100ns - 1us $0.01 - 0.001/bit Disk: in G Bytes ms 10-3 - 10-4 cents/bit Tape : infinite capacity sec-min 10-6 cents/bit
Why Memory Hierarchy Works? • The Principle of Locality: • Program Accesses a Relatively Small Portion of the Address Space at Any Instant of Time. Example: 90% of Time in 10% of the Code • Put All Data in Large Slow Memory and Put the Portion of Address Space Being Accessed into the Small Fast Memory. • Two Different Types of Locality: • Temporal Locality (Locality in Time): If an Item is Referenced, It will Tend to be Referenced Again Soon • Spatial Locality (Locality in Space): If an Item is Referenced, Items Whose Addresses are Close by Tend to be Referenced Soon.
Analysis of Memory Hierarchy Performance General Idea • Average Memory Access Time = Upper level hit rate Upper level hit time + Upper level miss rate Miss penalty • Example, let: • h = Hit rate: the percentage of memory references that are found in upper level • 1- h = Miss Rate • tm = the Hit Time of the Main Memory • tc = the Hit Time of the Cache Memory • Then, Average Memory Access Time = h tc + (1- h)(tc + tm) = tc + (1- h) tm Note: This example assumes cache has to be looked up to determine if miss has occurred. The time to look up cache is also equal to tc. • This formula can be applied recursively to multiple levels. Let: Let: The subscript Ln refer to the upper level memory (e.g., a cache) The subscript Ln-1 refer to the lower level memory (e.g., main memory) • Average Memory Access Time = hLn tLn + (1- hLn) [tLn + {hLn-1 tLn-1 + (1- hLn-1) (tLn-1 + tm)} ] • The trick is how to find the miss penalty
Cache Organization • Mechanism for looking up data • Index: to look up a block or a set in the cache • Tag: to determine if the data is what you want (hit or miss) • Byte Select (or Word Select): to select the byte (or word) that you need in a block • Block size: to take advantage of spatial locality • Temporal locality might be compromised if block size is too large • In general, larger block size has higher miss penalty (unless wide parallel memory is used) • Associativity: to reduce conflict • Direct Mapping • Set Associative • Fully Associative • Write Policy: to ensure consistency between cache and memory • Write Through • Write Back
Hit Byte 32 Large Block Size • For a 2N Byte Cache: • The Uppermost (32- N) Bits Are Always The Cache Tag • The Lowest M Bits Are The Byte Select ( Block Size = 2M ) • The Middle (32 - N - M) Bits Are The Cache Index 0x50 0x01 0x00 mux
0 1 2 3 0 4 1 5 2 6 3 7 8 9 0 1 2 3 0 4 Set 0 1 5 2 6 Set 1 3 7 8 9 0 1 2 3 0 4 Entire Cache 1 5 2 6 3 7 8 9 Associativity Direct Mapped: Memory Blocks (M mod N) go only into a single block Set Associative: Memory Blocks (M mod N) can go anywhere in a set of blocks Fully Associative: Memory Blocks (M mod N) can go anywhere in the cache
19 bits 12 bits 1 bit Tag index Word Sel V D Tag Word #1 Word #2 V D Tag Word #1 Word #2 0 0 … … … … … … 212 -1 212 -1 32 19 32 19 1 1 1 1 32 32 1 1 Select 2-to-1 MUX 2-to-1 MUX 2 x 32 2-to-1 mux word 1 word 2 Hit 32 2-to-1 mux 2-to-1 MUX 32-bit data = = Cache Overhead Estimation Example Memory size = 4 Gbytes (i.e., 32-bit address) Cache size = 64 Kbytes Word addressable Number of indexes = 216 bytes 1word/4 bytes 1block/2 words 1set/2 block = 212 (sets = # of index) Number of index bits = 12 bits Number of word select bits = 1 Number of bits in tag = 32 bits – 12 bits – 1 bit = 19 bits Storage overhead = (19 bits + 1 bit + 1 bit)/block 2 blocks/set 212 sets = 172032 bits Number of comparators = 19 bit/set 2 sets = 38 Number of multiplexors = 32 + 32 + 32 = 96 (2-to-1 mux) Miscellaneous gates: 2 AND gates and 1 OR gate
Secondary Storage cache Main memory Cache Design Virtual Memory Design Similarities Between Cache and Virtual Memory • Both Use Two Levels of Memories • Higher Level: Faster and Smaller • Lower Level: Slower and Larger • Both Rely on the Principle of Locality • Both Use Associativity to Reduce Conflicts • Both Need to Decide Which Block in Higher Level has to be Replaced Upon Miss
Differences Between Cache and Virtual Memory • Cache is several orders of magnitude faster than virtual memory, while virtual memory is several orders of magnitude larger than cache • Consequently • Virtual memory can use software to track blocks in use while cache has to use hardware • The cost to implement full associativity is low for Virtual memory and very high for cache • Virtual memory can use more sophisticated block replacement algorithms • Virtual memory has to use write-back while cache can use write-back or write-through
Page Table and Page Frame Table • Page Table: • Used by program to keep track which page is in the secondary store and which is in main memory • Translate virtual memory address into physical address Virtual Page # Valid Access Right Physical Page Address Page Table Pointer Note 00001000 0 R/W 29B000 00002000 1 R 737000 ... ... ... ... 000F000 1 X C37000 • Page Frame Table: • Used by the operating system to know how the pages in main memory are allocated to different active jobs • To provide information for deciding which page is candidate to be replaced
Read Address Write Address V=1 V=1 V=0 Old Page new phy addr New Page To Cache To Cache Address Mapping • Address Translation Determines If Main Memory Has the Requested Page by Examining the Valid Bit of the Page in the Page Table • If the Requested Page Is Not in Main Memory, Operating System Transfers Data from Secondary Memory to Main Memory and Then Set the Valid Bit. Write the old page back to memory if necessary (e.g., page modified but not saved).
20 18 Physical Address Access Rights Physical Page # To Memory if V=1 Translation of Virtual to Physical Address • Page Table Located in Physical Memory • V = Valid Bit: • V = 1: Page is in Main Memory • Access Rights: R = Read- Only, R/ W = Read/ Write, X = Execute Only
Translation Lookaside Buffer • Cache of Recently Used Page Table Entries • Can Be Fully Associative, Set Associative, or Direct Mapped • Direct Mapped TLB Example: index Note: Dirty bit indicates if the page in memory has been modified. If it has not been modified, it will be replaced without copying back to memory.
Virtual Address 31 12 11 0 Virtual Page Number Page Offset Valid Dirty Tag Physical Page # 12 20 TLB mux 20 = = = Physical Address = 2 14 Data Valid Tag Index Byte Offset Tag TLB Hit = 32 Cache Hit Data Virtual Memory and Cache MappingsExample: Decstation 3100 Note: Another important bookkeeping bit Write Access Bit for Write Protection Is Not Shown
Accessing Data from Memory Hierarchy TLB Tag TLB index Offset Virtual Address Format: Virtual Page # Procedure: Step 1: Translate virtual address to physical address Use TLB to reduce page table look up time If hit, use physical address in TLB to look up cache (step 2) If miss, go to page table in main memory If found in page table, update TLB and look up cache (step 2) If page fault, use page frame table to pick a page in memory to be replaced update page frame table update page table in memory copy data from disk to the selected memory page if the selected page is dirty, write it back to disk first update cache if the data from disk has a cache hit update TLB, get physical address and go to step 2 Step 2: Use physical address to access data from cache If hit, use data from cache If miss, go to main memory to access data update cache
I/O System Architecture Overview User Application system call System Interface Software Operating System Device Driver Device Driver Protocol can be defined at all levels Memory or I/O Bus Logical I/O Controller I/O Controller Hardware Media Physical I/O Device I/O Device
A Classificaiton of I/O According to the Targets of I/O Operation • Processor to Memory Very low latency, very high throughput, very low protocol overhead • Processor to Peripheral Latency, throughput, and protocol overhead vary according to the I/O devices • Processor to Processors • Tightly Coupled: all processors share a physical memory Low latency, high throughput, low overhead protocol, coherence problem • Loosely Coupled: each processor has its own physical memory Medium latency, medium throughput, high protocol overhead, scalable • Processor to Network High latency, low throughput, high protocol overhead, very scalable
I/O System Example Processor Processor Main Memory Cache Cache Memory - I/O Bus Network Interface Controller IEEE 1394 Bus Interface Contorller I/O Controller I/O Controller Disk Disk Graphics Network To Other Processors or Peripherals on the IEEE 1394 Bus
Device A? Device B? Device C? Device B? Device D? Bus A? Bus C? Bus B? Device A Device B ? ? Bus B ? ? ? Device C Device B Device D I/O System Design Process • Establish Requirements: Understanding What You Need • Select the I/O System That Has the Required Capability: Understand What the I/O System being Considered Can Do • Integration: Understand How Everything Fits Together • Implementation
Star Tracker Inertia Measurement Unit Power Control Unit Star Tracker Telecom Subsystem Inertia Measurement Unit Power Control Unit Telecom Subsystem I/O System Design Example: Establish Requirements • Design an I/O architecture for a spacecraft that has the following equipment Data Rate: 8 Mbps 1000 samples/sec Latency < 0.1 ms Data Rate: 5 Kbps 1transaction/sec Latency < 10 ms Data Rate: 10 Kbps 1000 samples/sec Latency < 0.1 ms Data Rate: 400 bps 2 commands/sec Latency < 0.5 sec Flight Computer (CDH) Flight Computer (ACS) Flight Computer (Payload) I/O? Thruster Control Unit Wide Angle Camera High Resolution Camera Radar Sounder Altimeter Thruster Control Unit Data Rate < 100 bps 10 commands/sec Latency < 0.1 ms Data Rate: 20 Mbps 2 frames/sec Latency < 0.5 sec Data Rate: 20 Mbps 2 frames/sec Latency < 0.5 sec Data Rate: 1 Mbps 1 transaction/sec Latency < 1 sec Data Rate: 5 Kbps 100 samples/sec Latency < 0.01 sec • System Constraints (Prioritized): • Total power consumption of the avionics system < 100 W. • The I/O system power consumption should be less than 35% of the avionics system. • Each subsystem has to meet the latency and throughput requirements • System reliability should exceed 12 years (i.e., requires fault tolerance) • The system design should be scalable and distributed. • Maximum distance between subsystems is 5 meters. Average distance is 3 m. • Minimize the cable mass.