120 likes | 201 Views
15-213 Recitation 5 – 2/15/01. Outline Visualizing execution Control hazards Branch prediction Memory Hierarchy Access times Exam 1 Preparation Old exam review. e-mail: staff-213@cs.cmu.edu Office Hours: Usual times Wean 52xx cluster. Reminders Exam 1 tomorrow.
E N D
15-213 Recitation 5 – 2/15/01 Outline • Visualizing execution • Control hazards • Branch prediction • Memory Hierarchy • Access times • Exam 1 Preparation • Old exam review e-mail: staff-213@cs.cmu.edu Office Hours: Usual times Wean 52xx cluster • Reminders • Exam 1 tomorrow
Visualizing Execution • Programmers think of processors sort of like this: • Where each instruction gets processed, and then the processor moves on to the next one. • This is not so in the modern world. • Pipelined execution detailed in this recitation. • There are more advanced techniques (superscalar, vliw, etc.) but they're too complicated for now. Instruction Stream Pentium X
Visualizing Execution • Pipelined execution is like an assembly line. While one car is in its last stage of production, another car is half way through, while another is at the beginning. • The 'standard' processor pipeline usually looks like: Fetch/ Decode Read Registers Execute Memory Writeback Registers
Visualizing Execution • Fetch/Decode • Fetch an instruction from the stream, and decode what it means (Figure out if it's an add, subtract, multiply, jump, etc.) • Read Registers • Read operands from registers so we know what data we want to perform the computation on. • Execute • execute the instruction. Calculate its output • Memory • Access (Read/Write) memory. • Writeback Registers • Write back the calculated result to registers.
int addarr(int *a, int n) { int i, sum; for(i = 0; i < n; ++i) { sum += a[i]; } return sum; } Loop that adds numbers in an array. Pretend n is in eax, sum is in ecx, i is in ebx, and a is in eex (yeah, it doesn't exist, but it's make-believe) movl $0, ecx .L3: cmpl eax, ebx jge .L4 .L6: leal 0(,ebx,4), edx movl (eex,edx), edx addl edx, ecx incl ebx jmp .L3 .L4: movl edx, eax movl ebp, esp popl ebp ret Example 1
Unrolling the loop so that we get less branch penalties. Here, we assume n is an even number. Notice no branch penalty. movl $0, ecx .L3: cmpl eax, ebx jge .L4 .L6: leal 0(,ebx,4), edx movl (eex,edx), edx addl edx, ecx incl ebx leal 0(,ebx,4), edx movl (eex,edx), edx addl edx, ecx incl ebx jmp .L3 .L4: movl edx, eax movl ebp, esp popl ebp ret Example 2
Memory Technology • SRAM • - access time ~4 nsec • - persistent as long as power is supplied • - used for high performance memories • -- registers, cache, video memory,… • DRAM • - access time ~60 nsec • - nonpersistent • -- every row must be accessed every ~1 ms • - main memory
Memory Technology Terminology Access Time - time it takes to read to or write from a memory location Cycle Time - minimum time required between two successive memory references Hit Rate - the fraction of memory references that hit
Memory Technology Effective Access Time - A way to characterize the performance of hierarchical memory - Let HL1 = L1 Cache hit rate HL2 = L1 Cache hit rate H = Total Cache hit rate T(x) = Access time of memory x TEff = HL1*T(L1) + HL2*T(L2) + (1 - H)*T(disk)
Memory Technology • Example: • Assume the following memory access times • and hit rates. • Access time Hit Rate • L1 Cache 5 ns 80% • L2 Cache 10 ns 15% • Main Memory 200 ns • Compute the Effective Access Time for this system.
Memory Technology Answer: TEff = (.8 * 5) + (.15 * 10) + (1 - .95) * 200 = 15.5 ns The memory behaves as if it were composed entirely of relatively fast chips with an 15.5 ns access time, even though it’s composed mostly of 200 ns chips!
Exam Review Let’s take a look at a prior semester’s exam…