160 likes | 238 Views
Solution to Assignment #4. 17 23 165 239 1200 1500. 10. 20. Cache memory: 1 k words Main memory : 64 k words Block: 128 words Mapping: Direct Cache access time: τ Main memory access time: 10 τ Required: time for instruction fetching. Problem 5.6.
E N D
17 23 165 239 1200 1500 10 20 • Cache memory: 1 k words • Main memory: 64 k words • Block: 128 words • Mapping: Direct • Cache access time:τ • Main memory access time: 10 τ • Required: time for instruction fetching. Problem 5.6 # of blocks in cache = 1 k/128 = 8 # of program blocks = 1500/128 = 12
First iteration of the outer loop Main Memory 17 23 165 239 1200 1500 10 20 Block 0 0..17..23..127 Block 0 0..17..23..127 Cache Memory 128*10τ Block 0 0..17..23..127 Block 8 1024..1151 Block 0 Block 1 128..165.. 239..255 Block 1 128..165.. 239..255 128*10τ (127-17+1)*τ Block 9 1152..1279 Block 1 Block 1 128..165.. 239..255 128 τ Block 2 256..383 Block 2 256..383 128*10τ 128τ+19(239-165+1)τ (1200-1152+1)τ Block 2 256..383 Block 2 Block 3 384..511 Block 3 384..511 128 τ CPU 128*10τ 128 τ Block 3 384..511 Block 3 Block 4 512..639 Block 4 512..639 128*10τ 128 τ Block 4 512..639 Block 4 Block 5 640..767 Block 5 640..767 128 τ 128*10τ Block 5 640..767 Block 5 Block 6 768..895 Block 6 768..895 128 τ 128*10τ 128*10τ 128 τ Block 6 768..895 Block 6 Block 7 896..1023 Block 7 896..1023 128*10τ 128*10τ Block 7 Block 7 896..1023 Block 8 1024..1151 Block 8 1024..1151 Block 9 1152..1279 Block 9 1152..1279 Time = [128*10τ+(127-17+1)*τ] + [128*10τ +128τ + 19(239-165+1)τ] + [7(128*10τ +128τ)] Time = [128*10τ+(127-17+1)*τ] + [128*10τ +128τ + 19(239-165+1)τ] Time = [128*10τ+(127-17+1)*τ] + [128*10τ +128τ + 19(239-165+1)τ] + [7(128*10τ +128τ)] + [128*10τ +(1200-1152+1)τ] Time = [128*10τ+(127-17+1)*τ] + [128*10τ +128τ + 19(239-165+1)τ] + [7(128*10τ +128τ)] + [128*10τ +(1200-1152+1)τ] = 15409 τ Time = [128*10τ+(127-17+1)*τ] + Block 10 1280..1407 Block 10 1280..1407 Block 11 1408..1535 Block 11 1408..1535
Following iterations of the outer loop Main Memory 17 23 165 239 1200 1500 10 20 Block 0 0..17..23..127 Block 0 0..17..23..127 Cache Memory 128*10τ Block 8 1024..1151 Block 0 0..17..23..127 Block 0 0..17..23..127 Block 8 1024..1151 Block 0 Block 1 128..165.. 239..255 Block 1 128..165.. 239..255 128*10τ Block 9 1152..1279 Block 1 128..165.. 239..255 Block 9 1152..1279 Block 1 Block 1 128..165.. 239..255 128 τ (127-23+1) τ Block 2 256..383 Block 2 256..383 128τ+19(239-165+1)τ (1200-1152+1)τ Block 2 256..383 Block 2 256..383 Block 2 Block 3 384..511 Block 3 384..511 128 τ CPU 128 τ Block 3 384..511 Block 3 Block 4 512..639 Block 4 512..639 128 τ 128*10τ Block 4 512..639 Block 4 Block 5 640..767 Block 5 640..767 128 τ Block 5 640..767 Block 5 Block 6 768..895 Block 6 768..895 128 τ 128*10τ 128 τ Block 6 768..895 Block 6 Block 7 896..1023 Block 7 896..1023 Block 7 Block 7 896..1023 Block 8 1024..1151 Block 8 1024..1151 Block 9 1152..1279 Block 9 1152..1279 Time = [128*10τ+(127-23+1)*τ] + [128*10τ +128τ + 19(239-165+1)τ] + [6(128τ)] + [128*10τ +128τ] Time = [128*10τ+(127-23+1)*τ] + Time = [128*10τ+(127-23+1)*τ] + [128*10τ +128τ + 19(239-165+1)τ] + [6(128τ)] Time = [128*10τ+(127-23+1)*τ] + [128*10τ +128τ + 19(239-165+1)τ] + [6(128τ)] + [128*10τ +128τ] + [128*10τ +(1200-1152+1)τ] Time = [128*10τ+(127-23+1)*τ] + [128*10τ +128τ + 19(239-165+1)τ] + [6(128τ)] + [128*10τ +128τ] + [128*10τ +(1200-1152+1)τ] = 7723 τ Time = [128*10τ+(127-23+1)*τ] + [128*10τ +128τ + 19(239-165+1)τ] + Block 10 1280..1407 Block 10 1280..1407 Block 11 1408..1535 Block 11 1408..1535
Rest of the program Main Memory 17 23 165 239 1200 1500 10 20 Block 0 0..17..23..127 Block 0 0..17..23..127 Cache Memory Block 8 1024..1151 Block 0 0..17..23..127 Block 0 0..17..23..127 Block 8 1024..1151 Block 0 Block 1 128..165.. 239..255 Block 1 128..165.. 239..255 Block 9 1152..1279 Block 1 128..165.. 239..255 Block 9 1152..1279 Block 1 Block 1 128..165.. 239..255 Block 2 256..383 Block 2 256..383 (1279-1200)τ Block 10 1280..1407 Block 2 256..383 Block 2 256..383 Block 2 Block 3 384..511 Block 3 384..511 128 τ CPU (1500-1408+1) τ Block 11 1408..1535 Block 3 384..511 Block 3 Block 4 512..639 Block 4 512..639 Block 4 512..639 Block 4 Block 5 640..767 Block 5 640..767 128*10τ Block 5 640..767 Block 5 Block 6 768..895 Block 6 768..895 Block 6 768..895 Block 6 Block 7 896..1023 Block 7 896..1023 128*10τ Block 7 Block 7 896..1023 Block 8 1024..1151 Block 8 1024..1151 Block 9 1152..1279 Block 9 1152..1279 Time = [(1279-1200)τ] + [128*10τ+128τ] Time = [(1279-1200)τ] + Time = [(1279-1200)τ] + [128*10τ+128τ] + [128*10τ+(1500-1408+1)τ] Time = [(1279-1200)τ] + [128*10τ+128τ] + [128*10τ+(1500-1408+1)τ] = 2860 τ Block 10 1280..1407 Block 10 1280..1407 Block 11 1408..1535 Block 11 1408..1535
Problem 5.6 (Cont.) Total time = 15409τ + 9*7723τ + 2860τ = 87776τ
Cache memory: 4 k words • Block: 64 words • Set: 4 blocks • Mapping: Set associative • Cache access time:τ • Main memory access time: 10 τ • Replacement policy: LRU • CPU fetches the words: 0, 1, 2, …, 4351 • CPU repeats that 9 more time. • Required: improvement due to the cache. Problem 5.10 # of blocks in cache = 4 k/64 = 64 # of sets in cache = 64/4 = 16 sets
Problem 5.10 (Cont.) • CPU fetches the words: 0, 1, 2, …, 4351 10 times. • Words 0, 1, 2, …, 63: block 0 • Words 64, 65, …, 127: block 1 • …………………………………… • Words 4288, 4289, …, 4351: block 67
Cache Memory First iteration: 68 misses Block0 Block64 Block0 Block16 Main Memory Set 0 Block32 Block 0 Block48 Block1 Block1 Block65 Block 1 Block17 Set 1 Block33 Block49 Block2 Block66 Block2 Block 15 Block18 Set 2 Block 16 Block34 Block50 Block3 Block3 Block67 Block19 Set 3 Block 63 Block35 Block 64 Block51 Block4 Block 65 Set 4 Block20 Block 66 Block36 Block 67 Block52 Time = 68*11τ*block size Block15 Set 15 Block31 Block47 Block63
Cache Memory For any following iteration Block0 Block64 Block0Block64 Block48 Block0 Miss: 18 Miss: 11 Miss: 19 Miss: 17 Miss: 16 Miss: 14 Miss: 13 Miss: 12 Miss: 20 Miss: 9 Miss: 15 Miss: 8 Miss: 7 Miss: 6 Miss: 5 Miss: 10 Miss: 4 Miss: 3 Miss: 2 Miss: 1 Miss: 0 Block16 Block0 Block16Block0 Block64 Block16 Main Memory Set 0 Block32 Block16 Block32 Block 0 Block48 Block 32 Block48 Block1 Block1 Block65 Block1Block65 Block49 Block 1 Block17 Block1 Block17 Block17Block1 Block65 Block17 Set 1 Block 2 Block33 Block33 Block17 Block49 Block49 Block 33 Block 3 Block2 Block66 Block2 Block2Block66 Block50 Block18 Block18Block2 Block66 Block18 Block2 Set 2 Block34 Block34 Block18 Block50 Block34 Block50 Block 16 Block3 Block3Block67 Block51 Block3 Block67 Block19 Set 3 Block19Block3 Block67 Block19 Block3 Block35 Block35 Block19 Block51 Block35 Block51 Block4 Set 4 Block20 Block36 Block 67 Block52 Time = (20*11τ + 48*τ)*block size Block15 Set 15 Block31 Block47 Block63
Problem 5.10 (Cont.) • Let s be the block size in words. • Without Cache: • Time = 10*68*10 τ *s = 6800τs • With Cache: • First iteration: time = 68*11τ*s = 748τs • Any following iteration: time = 20*11τ*s + 48*τ*s = 268τs • Total time = 748τs + 9*268τs = 3160τs • Improvement factor = 6800τs/3160τs = 2.15 • OR improvement % = (6800-3160)/6800=53.5%
Array: (1024 × 1024) 4-byte elements • Page size: 4 kB • Memory allocated: 1 MB • Time to fetch a page from disk: 40 ms • Required: • # of page faults if elements are stored in column order. • Same as (a) if elements are stored in row order. • Total time needed for each (a) and (b). Problem 5.21 • # of pages allocated in memory = 1MB/4kB = 256 pages • # of pages needed for the array = 1024*1024*4/4kB = 1024 • Each page can hold one row or one column.
Algorithm: for every column c find the largest element (max) in c for every element e in c e←e/max Problem 5.21(Cont.) Note: The memory is initially empty. (a) 1024 page faults. (b) For every column, finding the largest element takes 1024 page faults. Then, normalization is done, which takes 1024 page faults for every column. This is a total of 2048 page faults for every column. Total # of page faults = 2048 * 1024 = 2,097,152 (c) 1024 * 40ms = 40.96 s and 2,097,152 * 40 ms = 23.3 hours
Page size: 4 kB • # of data pages on disk = 1024 • # of data pages in memory = 256 • Each page can hold one row or one column. Disk Main Memory Page 0 Page frame i Page 1 Page frame i+1 Page 2 Page frame i+2 Page 3 Page frame i+3 Data: 256*4*1024 = 1MB = 256 pages Data: 1024*1024*4 = 4MB = 1024 pages Page frame i+255 Page 1023
(a) Column order for every column c find max of c for every element e in c e←e/max Disk Main Memory Column 0 Column 0 Page 0 Column 0 Page frame i Column 1 Page 1 Column 1 Column 1 Page frame i+1 • Find max of column 3 • Normalize elements of column 3 • Find max of column 0 • Normalize elements of column 0 • Find max of column 2 • Normalize elements of column 2 • Find max of column 255 • Normalize elements of column 255 • Find max of column 1 • Normalize elements of column 1 Page 2 Column 2 Column 2 Column 2 Page frame i+2 Page 3 Column 3 Column 3 Column 3 Page frame i+3 Column 255 Column 255 Column 256 Column 255 Page frame i+255 1 0 2 Page faults: 256 3 4 Column 1023 Page 1023 Total # of page faults = 256 * 4 = 1024
(b) Row order for every column c find max of c for every element e in c e←e/max Disk Main Memory Row 0 Row 0 Page 0 Row 0 Page frame i Read column 0 and find its largest value Row 1 Page 1 Row 1 Page frame i+1 Row 1 Page 2 Row 2 Row 2 Row 2 Page frame i+2 Page 3 Row 3 Row 3 Row 3 Page frame i+3 Row 255 Row 255 Row 256 Row 255 Page frame i+255 # of page faults to find max of column 0 = 1024 Row 1023 Page 1023 # of page faults to update values in column 0 = 1024 # of page faults for column 0 = 1024 + 1024 = 2048 Total # of page faults = 2048 * 1024 = 2,097,152