480 likes | 663 Views
Chapter 5:. 22540 - Computer Arch. & Org. (2). Memory Hierarchy. Memory Hierarchy. Principle of Locality Temporal Locality (Locality in Time) Spatial Locality (Locality in Space) Speed & Size. Memory Hierarchy. Magnetic Disks. CPU. Main Memory. Cache. Cache Memory.
E N D
Chapter 5: 22540 - Computer Arch. & Org. (2) Memory Hierarchy
Memory Hierarchy • Principle of Locality • Temporal Locality (Locality in Time) • Spatial Locality (Locality in Space) • Speed & Size
Memory Hierarchy Magnetic Disks CPU Main Memory Cache
Cache Memory • High Speed (Towards CPU) • Conceals Slow Memory • Small Size (Low Cost) Access = Cache + Mem MainMemory (Slow)Mem Miss CPU Cache(Fast)Cache Hit 95% hit ratio
Cache Memory • CPU – Main Memory Address • Cache Size < Main Memory Size MainMemory 4 GB CPU 32-bit Address Cache1 MB Only 20 bits !!!
Cache Memory MainMemory 00000000 00000001 • • • • • • • • • • 3FFFFFFF Cache 00000 00001 • • • • FFFFF Address Mapping !!!
Associative Memory Cache Location MainMemory 00000000 00000001 • • 00012000 • • 08000000 • • 15000000 • 3FFFFFFF Cache 00000 00001 • • • • FFFFF 00012000 15000000 08000000 Address (Key) Data
Associative Memory Address 00012000 Cache Can have any number of locations 00012000 4 8 Data 6 3 15000000 4 8 1 3 08000000 32 Bits(Key) 8 Bits(Data)
Associative Memory Address 00012000 Cache 00012000 4 8 ? ? ? = = = 6 3 15000000 1 3 08000000 How many comparators? 32 Bits(Key) 8 Bits(Data)
Associative Memory Address 00012000 Cache 1 00012000 4 8 ? ? ? = = = 6 3 1 15000000 1 3 1 08000000 ValidBit 32 Bits(Key) 8 Bits(Data)
Associative Memory 32 Bits Address 0000 0000 0000 0000 0000 0000 1000 1000 Cache 0 • • • 0000 1000 10 4 8 5 4 1 7 6 2 6 3 4 4 8 2 1 9 • • • 4 8 5 4 1 7 6 2 1 3 7 6 2 4 6 8 • • • Data 4 8 32 Bits(Data) 30 Bits(Key)
Direct Mapping Cache Address What happens when Address= 100 00040 000 00040 Cache 00000 000 1 6 00040 Tag Data 7 C 000 1 6 00800 080 0 5 04000 150 FFFFF Compare Match No match 20Bits(Index) 12 Bits(Tag) 8Bits(Data)
Direct Mapping Cache Address 0000 0000 0000 0000 0000 0000 0100 0000 Data 4 8 00000 Cache Select 0 0 0 00010 4 8 5 4 1 7 6 2 Tag 000 4 8 5 4 1 7 6 2 1 3 7 6 2 4 6 8 00200 0 8 0 01000 1 5 0 6 3 4 4 8 2 1 9 3FFFF Compare Match No match 18Bits(Index) 12 Bits(Tag) 32Bits(Data)
Set Associative Cache Address 000 00040 2-Way Set Associative Cache 00000 000 1 6 00040 030 4 9 Data Tag Data Tag 7 C 000 1 6 030 1 6 00800 080 3 1 070 0 5 04000 150 2 0 090 FFFFF Compare Compare 20Bits(ndex) 12 Bits(Tag) 8Bits(Data) Match No match
Cache Size Example: Number of Blocks = 4 K Block Size = 4 Words Word Size = 32 bits Address Size = 32 bits Tag Bits (Direct Mapping Cache) = Tag Bits (2-Way Set Associative) = Tag Bits (4-Way Set Associative) = Tag Bits (Associative Cache) =
Block Size • Increasing Block Size • Utilizes Spatial Locality • Reduces the Number of Blocks
Cache Performance Example: CPU CPI = 2 clocks/instruction Loads & Stores instructions = 36% Instruction Cache = 2% miss rate Data Cache = 4% miss rate Memory Miss Penalty = 100 clocks Instructions Penalties = Data Penalties = CPI (with penalties) = Perfect Cache Speedup =
Cache Performance • Average Memory Access Time (AMAT) • AMAT = Time for a Hit + Miss Rate × Miss Penalty Example: Clock Cycle = 1 ns Cache Access Time (Hit) = 1 ns Cache Miss Penalty = 20 Clocks Miss Rate = 5% AMAT =
Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Direct Mapping) CPU Reference 0 8 0 6 8 Miss Miss Miss Miss Miss Cache 0 Tag 1 2 3
Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 2 blocks (2-Way Set Associative) CPU Reference 0 8 0 6 8 Miss Miss Hit Miss Miss Cache 0 Tag 1
Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Associative) CPU Reference 0 8 0 6 8 Miss Miss Hit Miss Hit Cache
Instruction Cache • Cache Miss • Send original PC value to memory • Perform a read operation • Wait for the cache to receive the instruction • Restart instruction execution
Data Cache Writes • Write-Through • Consistent Copies • Slow Mem CPU Cache
Data Cache Writes • Write-Through • Consistent Copies • Slow Example: CPI without miss = 1 Memory delays = 100 clocks 10% of memory references are writes Overall CPI =
Data Cache Writes • Write-Through with Write Buffer • Buffer size • Fill-Rate and Mem-Rate (Possible Stall) Mem CPU Cache
Data Cache Writes • Write-Back • Fast • Complex & inconsistent copies Mem Block Replacement CPU Cache
Data Cache Writes • Write-Back • Fast • Complex & inconsistent copies Mem Miss Block Replacement CPU Cache
Data Cache Writes • Write-Back with Buffer • Reduces the “Miss” penalty by 50% Mem Miss Block Replacement CPU Cache
Cache Replacement Policies • First In First Out (FIFO) • Simple • May replace a block which is used more, leading to a miss • Least Recently Used (LRU) • More complex • Better Hit Rate
Multilevel Cache Example: CPU CPI = 1 @ 4 GHz 0.25 ns Clock Primary Cache Miss Rate = 2% Memory Access Time = 100 ns 400 Clocks CPI (Single Level Cache) = Secondary Cache Miss Rate = 0.5% Secondary Cache Access Time = 5 ns 20 Clocks CPI (2-Level Cache) =
Main Memory • Latency & Bandwidth • Address (Selection of row & column) • Data Transfer (Number of bits) Example: Send Address = 1 clock Memory Access = 15 clocks Transfer a 32-bit Word = 1 clock Cache Block = 4 words Cache Miss Memory Bandwidth =
Main Memory • Simple Design • Wide Bus • Interleaved Example: 1 Word 1 Word Mem CPU Cache 1 Word 2 Words Mem CPU Cache Mem 1 Word 1 Word CPU Cache Mem
Virtual Memory • Allow Efficient & Safe Sharing of Memory • Memory Protection • Program Relocatability • Remove Programming Burdens of Small Memory • Much Larger Memory Space • Reuse Physical Memory
Virtual Memory Segmentation • Segments • Variable Size • Two-Part Address 0 • • • 000 0 • • • 00 Segment1 Segment0 0 • • • 001 0 • • • 01 Frame 0 • • • • • • Segment1 31 • • • 0 ?? • • • 0 Segment0 Frame 1 Offset Segment Number • • • • • • Translation 23 • • • • • • • • • 0 Offset Segment #
Virtual Memory Paging • Virtual Memory • Pages • Stored on Disk • Virtual Address • Physical Memory • Frames • Stored in RAM • Physical Address • Page Faults 0 • • • 000 0 • • • 00 Page 0 0 • • • 001 0 • • • 01 Page 0 Frame 0 • • • • • • Page 1 Page 1 Frame 1 • • • • • •
Virtual Memory Paging • AddressTranslation 0 • • • 000 0 • • • 00 Page 1 0 • • • 001 0 • • • 01 Page 0 Frame 0 • • • • • • Page 0 31 • • • 12 11 • • • 0 Page 1 Frame 1 Page Offset Virtual Page Number Virtual Address • • • • • • Translation 23 • • • 12 11 • • • 0 Page Offset Physical Page # Physical Address
Paging Table • Page Table • Virtual to Physical Page Number Translation • Stored in RAM • Page Table Register Page Table Register
Paging Table • Page Faults • Swap Space • Reserved space for full virtual memory space for a process • Stored on Disk • Page Table • LRU Replacement Scheme Disk Storage 0 1 2 • • • Virtual Page Number
Paging Table • Page Table Size Example: Virtual Address: 32 bits Page Size: 4 KB Page Table: 4 Bytes/Entry Number of Pages = Page Table Size = Disk Storage 0 1 2 • • • Virtual Page Number
Translation-Lookaside Buffer (TLB) • Address Translation Cache Virtual Page Number Disk Storage
Virtual Memory Misses • TLB Miss • Page Fault • Cache Miss Virtual Address Hit Hit Hit Miss Miss Miss PageTable TLB Cache PageFault UpdateTLB
Memory Hierarchy Misses • Compulsory Miss • Capacity Miss • Conflict Miss
Parallelism & Cache Coherence • Coherence • What values can be returned by a read • Consistency • When a written value will be returned by a read Main Memory 0 0 Cache Cache 0 0 0 0 0 1 Processor Processor
Cache Coherence Enforcement • Migration (of Data to Local Caches) • Reduces latency & bandwidth for shared memory. • Replication (of Read-shared Data) • Reduces latency & contention for access
Cache Coherence Protocol • Snooping • Each cache monitors bus reads/writes. • Processors exchange full blocks. • Large block sizes may lead to false sharing. Main Memory 0 1 0 0 Invalidate Cache Cache 0 0 0 0 0 1 0 1 Processor Processor
Cache Coherence Protocol • Directory-based protocols • Caches and memory record sharing status of blocks in a directory.
Chapter 5 The End