470 likes | 618 Views
Ch. 3-2 CPU. Input and output. Supervisor mode, exceptions, traps. Memory management and address translation Caches CPU Performance and power consumption Co-processors. Caches and CPUs. cash hit vs. cache miss. address. data. cache. main memory. CPU. cache controller. address. data.
E N D
Ch. 3-2 CPU Input and output.Supervisor mode, exceptions, traps.Memory management and address translationCachesCPU Performance and power consumptionCo-processors.
Caches and CPUs cash hit vs. cache miss address data cache main memory CPU cache controller address data data
Cache operation • Many main memory locations are mapped onto one cache entry. • May have caches for: • instructions; • data; • data + instructions (unified). • Memory access time is no longer deterministic.
Terms • Cache hit: required location is in cache. • Cache miss: required location is not in cache • Working set: set of active locations used by a program in a time interval.
Types of misses • Compulsory (cold): location has never been accessed. • Capacity: working set is too large. • Conflict: multiple locations in working set map to same cache entry.
Memory system performance • h = cache hit rate. • tcache = cache access time • tmain = main memory access time. • Average memory access time:
Memory L2 cache CPU L1 cache Multiple levels of cache
Multi-level cache access time • h1 = cache hit rate. • h2 = rate for miss on L1, hit on L2. • Average memory access time:
Replacement policies • Replacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location. • Two popular strategies: • Random. • Least-recently used (LRU).
Cache organizations • Fully-associative: any memory location can be stored anywhere in the cache (almost never implemented). • Direct-mapped: each memory location maps onto exactly one cache entry. • N-way set-associative: each memory location can go into one of n sets.
Cache performance benefits • Keep frequently-accessed locations in fast cache. • Cache retrieves more than one word at a time. • Sequential accesses are faster after first access.
Direct-mapped cache valid tag data 32K cache blocks Cache block address 15 2 tag index offset = 15 32-bit address lines value hit Used to select a required value from the data field
Direct-mapped cache with block size of 1 word • If each block contains only one memory location, • 2k words in cache memory • 2n words in main memory • n-bit memory address = k bits for the index field + (n-k) bits for the tag field.
Direct mapping cache with block size of 8 words, 64 blocks Address Memory valid tag data xx … xx000000000 1220 xx … xx000000001 0 A135 1220 A135 1 2340 3450 6578 6578 1900 DB18 1456 1136 . . . . . . 2 3 1B57 2468 6710 12D5 xx … xx000001000 2340 xx … xx000001001 3450 xx … xx000001011 6578 63 . . . . . . xx … xx000011000 1B57 xx … xx000011001 2468 xx … xx000011011 6710 xx … xx000011100 12D5 23 bits 6 bits 3 bits Address generated by CPU . . . . . . Tag index offset yy … yy 000001 000 yy … yy000001000 ABC1 . .
Direct Mapping Cache • Two blocks with the same index in their address but with different tag values cannot reside in cache memory at the same time.
Write operations • Write-through: immediately copy write to main memory. • Write-back: write to main memory only when a location is removed from cache. • Can reduce number of memory writes
Direct-mapped cache locations • Many locations map onto the same cache block. • Conflict misses are easy to generate: • Array a[] uses locations 0, 1, 2, … • Array b[] uses locations 1024, 1025, 1026, … • Operation a[i] + b[i] generates conflict misses. Different tags with same block address (index) 0x0000 = 0000000000000000 0x1024 = 0000010000000000 tag index offset
Set-associative cache • A set of direct-mapped caches: Line (Block or Index) Tag Set 1 Set 2 Set n ... hit data
Two-way Set Associative Cache Index/block Tag Data Tag Data Cache word 000000 000001 xx…xx 2340 3450 ... yy..yy ABC1 … Set 111111 Each word is stored together with its tag and the number of tag-data pairs in one word of cache is said to form a set.
After 001 access: block tag data 00 - - 01 0 1111 10 - - 11 - - After 010 access: block tag data 00 - - 01 0 1111 10 0 0000 11 - - Direct-mapped cache behavior Access sequence: 001, 010, 011, 100, 101, 111 Block address = Index
After 011 access: block tag data 00 - - 01 0 1111 10 0 0000 11 0 0110 After 100 access: block tag data 00 1 1000 01 0 1111 10 0 0000 11 0 0110 Direct-mapped cache behavior, cont’d. Access sequence: 001, 010, 011, 100, 101, 111
After 101 access: block tag data 00 1 1000 01 1 0001 10 0 0000 11 0 0110 After 111 access: block tag data 00 1 1000 01 1 0001 10 0 0000 11 1 0100 Direct-mapped cache behavior, cont’d. Access sequence: 001, 010, 011, 100, 101, 111
2-way set-associtive cache behavior • Final state of cache (twice as big as direct-mapped): set blk 0 tag blk 0 data blk 1 tag blk 1 data 00 1 1000 - - 01 0 1111 1 0001 10 0 0000 - - 11 0 0110 1 0100 Access sequence: 001, 010, 011, 100, 101, 111 Index(set address)
2-way set-associative cache behavior • Final state of cache (same size as direct-mapped): set blk 0 tag blk 0 data blk 1 tag blk 1 data 0 01 0000 10 1000 1 10 000001 1111 1101 0100 0110 Access sequence: 001, 010, 011, 100, 101, 111
Example caches • StrongARM: • 16 Kbyte, 32-way, 32-byte block instruction cache. • 16 Kbyte, 32-way, 32-byte block data cache (write-back). • SHARC: • 32-instruction, 2-way instruction cache.
Direct Mapped vs. Set Assosiative • SA provides higher hit rates than DM because conflicts between a small number of locations can be resolved within the cache. • SA cache structure provides less predictability of cache miss since conflicts in a set-associative cache is more difficult to analyze.
Suppose two different memory locations A B, and C are mapped into the same cache address in DM and 2-way SA caches, respectively. • If CPU accesses A, B, A for the first time and in sequence, the number of cache miss? • In DM, three cache misses. • In 2-way SA, two cache misses.
logical address physical address CPU Main Memory MMU Secondary Storage Swapping Data Program-generated address Main memory location Memory management units • Memory management unit (MMU) translates addresses:
Why address translation? • Address space: 1024KW (Auxiliary Memory) • Memory space: 32KW (Main Memory) • Address field of instruction code: 20 bits • Physical memory addresses: 15 bits • Access to 20-bit address must be somewhere in the physical memory!
Relation between AS and MS Program 1 Data 1, 1 Program 1 Data 1, 2 Data 1, 1 Program 2 Data 2, 1 Data 2, 2 Memory Space = 32K = 215 Address Space = 1024K = 220
Memory management tasks • Allows programs to move in physical memory during execution. • Allows virtual memory: • memory images kept in secondary storage; • images returned to main memory on demand during execution. • Page fault: request for location not resident in memory.
Virtual address register (20 bits) Memory mapping table Main memory address register (15 bits) Main Memory Virtual address Main memory buffer register Memory table buffer register Address translation • Requires some sort of register/table to allow arbitrary mappings of logical to physical addresses.
Memory Table in a paged system Page No Line No Page 0 101 0101010011 Virtual Address Page 1 Page 2 Table address Page 3 Presence Bit Page 4 Block 0 000 Page 5 Block 1 0 001 Page 6 Block 2 11 1 010 Page 7 Block 3 00 1 Block 0 011 0 01 0101010011 Block 1 Address space 8K = 213 Memory space 4K = 212 100 0 Block 2 101 Main memory address register 01 1 Block 3 110 AS and MS split into groups of 1K words 10 1 111 0 MBR 01 1 Memory page table
Associative Memory Page Table Virtual address 001 Line number Augment register 001 11 010 00 Associative Memory 101 01 110 10 Page no. Block no.
Segment-Page Mapping Segment Page Word + Block Word
Associate memory (TLB) Augment register Segment Page Block
Address Translation Scheme • Two basic schemes: • segmented; • paged. • Segmentation and paging can be combined (x86).
Segments and pages page 1 page 2 Segment 1 segment 2 Segment 2
Segment address translation segment base address Virtual(logical) address + range error segment lower bound range check segment upper bound physical address
Page address translation Virtual (logical) address page offset page i base concatenate page offset Memory address
Page table organizations page descriptor page descriptor flat tree
Caching address translations • Large translation tables require main memory access. • TLB: cache for address translation. • Typically small.
ARM memory management • Memory region types: • section: 1 Mbyte block; • large page: 64 kbytes; • small page: 4 kbytes. • An address is marked as section-mapped or page-mapped. • Two-level translation scheme.
Translation table base register 1st index 2nd index offset 1st level table descriptor concatenate 2nd level table descriptor concatenate physical address ARM 2-stage address translation