210 likes | 411 Views
Announcements. Assignment #3 is due today. Solution will be posted soon. Midterm exam: الخميس 25 نوفمبر 2010 ، الساعة 2:15 – 3:45م ، قاعة 4د. Previous exams and solutions are posted. Sample midterm exam? Midterm exam study guide. Review for Midterm Exam. Lecture 8.
E N D
Announcements • Assignment #3 is due today. Solution will be posted soon. • Midterm exam: • الخميس 25 نوفمبر 2010 ، الساعة 2:15 – 3:45م ، قاعة 4د. • Previous exams and solutions are posted. • Sample midterm exam? • Midterm exam study guide.
Review for Midterm Exam Lecture 8
Assignment #2, Problem 1 • What does the given program do? Disassemble the given program: MPY 300 01000011 0000 0000 4300 DEC (C) 10000000 0000 0000 8000 BNZ 2EE 10010010 1110 1110 92EE Main Memory 4300 02EE C AC 8000 Program 02EF X 2 XX 1 3 XXX0 XXX 53 XX 52 X5 1 92EE 02F0 C Z N V : : 0 X1 Condition Codes register 5 0300 The program computes 53 and stores the result in AC 19 0301
Assignment #2, Problem 1 (Cont.) • 2-word blocks 1 bit for the word field • 2-line cache 1 bit for the line field • 16-bit address tag = 16-1-1 = 14 bits
Assignment #2, Problem 1 (Cont.) Main Memory 4300 02EE 8000 Cache Memory 02EF Program 92EE 02F0 tag 2-word line : : 19 …. 00C0 00BC 19 …. 00C0 00BC 5 92EE 92EE 5 Line 0 8000 00BB 4300 Line 1 5 0300 0000 0010 1111 0000 0000 0010 1110 1110 0000 0011 0000 0000 02F0 0300 02EE 19 0301 0000 0010 1111 0001 0000 0010 1110 1111 0000 0011 0000 0001 02F1 0301 02EF Tag 00BB Tag 00BC Tag 00C0 Line 1 Line 0 Line 0 + (2*150)+25 = 1000 ns Pass 1: 2(150+25) + (2*150)+25 Pass 2: 2*25 + (2*150)+25 + (2*150)+25 = 700 ns Pass 3: 700 ns
Cache memory: 1 k words • Block: 16 words • Set: 4 blocks • Mapping: Set associative • Cache access time:τ • Main memory access time: 12 τ • Replacement policy: LRU • CPU fetches the words: 0, 1, 2, …, 1087 • CPU repeats that 9 more time. • Required: improvement due to the cache. Assignment #2, Problem 4 # of lines in cache = 1 k/16 = 64 # of sets in cache = 64/4 = 16 sets
Assignment #2, Problem 4 (Cont.) • CPU fetches the words: 0, 1, 2, …, 1087 10 times. • Words 0, 1, 2, …, 15: block 0 • Words 16, 17, …, 31: block 1 • …………………………………… • Words 1072, 1073, …, 1087: block 67
Cache Memory First iteration: 68 misses Block0 Block64 Block0 Block16 Main Memory Set 0 Block32 Block 0 Block48 Block1 Block1 Block65 Block 1 Block17 Set 1 Block33 Block49 Block2 Block66 Block2 Block 15 Block18 Set 2 Block 16 Block34 Block50 Block3 Block3 Block67 Block19 Set 3 Block 63 Block35 Block 64 Block51 Block4 Block 65 Set 4 Block20 Block 66 Block36 Block 67 Block52 Time = 68*13τ*block size Block15 Set 15 Block31 Block47 Block63
Cache Memory For any following iteration Block0 Block64 Block0Block64 Block48 Block0 Miss: 18 Miss: 11 Miss: 19 Miss: 17 Miss: 16 Miss: 14 Miss: 13 Miss: 12 Miss: 20 Miss: 9 Miss: 15 Miss: 8 Miss: 7 Miss: 6 Miss: 5 Miss: 10 Miss: 4 Miss: 3 Miss: 2 Miss: 1 Miss: 0 Block16 Block0 Block16Block0 Block64 Block16 Main Memory Set 0 Block32 Block16 Block32 Block 0 Block48 Block 32 Block48 Block1 Block1 Block65 Block1Block65 Block49 Block 1 Block17 Block1 Block17 Block17Block1 Block65 Block17 Set 1 Block 2 Block33 Block33 Block17 Block49 Block49 Block 33 Block 3 Block2 Block66 Block2 Block2Block66 Block50 Block18 Block18Block2 Block66 Block18 Block2 Set 2 Block34 Block34 Block18 Block50 Block34 Block50 Block 16 Block3 Block3Block67 Block51 Block3 Block67 Block19 Set 3 Block19Block3 Block67 Block19 Block3 Block35 Block35 Block19 Block51 Block35 Block51 Block4 Set 4 Block20 Block36 Block 67 Block52 Time = (20*13τ + 48*τ)*block size Block15 Set 15 Block31 Block47 Block63
Assignment #2, Problem 4 (Cont.) • Let s be the block size in words. • Without Cache: • Time = 10*68*12 τ *s = 8160τs • With Cache: • First iteration: time = 68*13τ*s = 884τs • Any following iteration: time = 20*13τ*s + 48*τ*s = 308τs • Total time = 884τs + 9*308τs = 3656τs • Improvement factor = 8160τs/3656τs = 2.23 • OR improvement % = (8160-3656)/8160=55.2%
Cache Memory Problem 4.8Pseudo Least Recently Used • 4-way set associative cache. • To do replacement, three bits: B0, B1, and B2 are associated with each set. • B0 is used to determine which half is the LRU. • B1 is used to determine which line in the upper half is the LRU. • B2 is used to determine which line in the lower half is the LRU. L0 L0 B0 B1 B2 L1 L1 B0 = 0 One Set L2 L2 B0 = 1 B1 = 1 L3 L3 B1 = 0 B2=0 B2=1
Problem 4.8 (Cont.) The algorithm works as follows. Assume the access is to Lx. If Lx was L0, set B0 to 1 and B1 to 1. If Lx was L1, set B0 to 1 and B1 to 0. If Lx was L2, set B0 to 0 and B2 to 1. If Lx was L3, set B0 to 0 and B2 to 0.
This algorithm approximates a true LRU. Consider the following scenario: • Lines are used in this order: L1, then L3, then L2, then L0. The algorithm would replace L3. Which is not the true LRU line (L1 is). Problem 4.8 (Cont.)
To implement a true LRU, consider a matrix A of 4 rows and 4 columns. Take only the upper right triangle of the matrix, excluding the diagonal. Problem 4.8 (Cont.) • Reference line k set its row to 1 and column to 0. • Replace the line with all zeros in its row and all ones in its column. 0 12 3 0 1 2 3 / 0 1 0 / 1 0 / 1 * * * * 1 0 / 1 0 / * * * * Example: lines 1, 3, 0, 2 1 0 / * * * * Line with zeros in row and ones in column? * * * * Line 1 replace it.
When an instruction is brought from memory to CPU, on what bus does it go? • Address bus. • (B) Data bus. • (C) Control bus. • (D) CPU internal bus. • (E) None of the above. • Which of the following is (are) components of an instruction fetch? • I. The contents of the PC register are copied into MAR. • II. Interrupts are checked out. • III. The contents of the MDR register are copied into the IR. • (A) I only. (B) II only. (C) I and III only. (D) II and III only. (E) I, II, and III.
Which of the following is (are) true about interrupts? • I. CPU checks for interrupts at the end of every fetch cycle. • II. CPU saves context when an interrupt is to be serviced. • III. A priority scheme helps in handling multiple/nested interrupts. • (A) I only. (B) II only. (C) I and III only. • (D) II and III only. (E) I, II, and III.
Consider a 32-byte direct-mapped write-back cache memory with 8-byte blocks. Complete the following table for a sequence of memory references (occurring from left to right). Addresses are in decimal. Assume cache is initially empty. word Tag Line 3 2 3 0 0 3 0 2 1 2 2 3 1 1 0 0 1 1 1 0 0 0 1 0 H M M M M M M M M H H No No Yes No No No No No Yes No No # of lines in cache = 32/8 = 4 lines …001 00 000 …000 10 000 …000 01 000 …001 11 100 …001 10 000 …001 11 000
Repeat assuming 2-way set associative mapping and LRU replacement. word Tag Set 3 1 1 0 0 1 0 0 1 0 0 1 1 0 0 3 2 3 0 3 1 1 3 0 M M M H M M H H M H M No No No No No No Yes No No No No # of lines in cache = 32/8 = 4 lines …0010 0 000 …0011 1 000 …0000 1 000 …0011 0 000 …0011 1 100 …0001 0 000 # of sets in cache = 4/2 = 2 sets
A given computer has 16-bit instructions. Operand addresses are specified using 6-bit fields. At most, there are N0 zero-address instructions and N1 one-address instructions. There are no three-address instructions. What is the maximum number of two-address instructions in this computer? N0 + (N1 × 26)+ (N2 × 26 × 26) = 216Solving for N2: N2 = (216 – N0 – (N1 × 26))/212