190 likes | 282 Views
Cache Physical Implementation. Panayiotis Charalambous Xi Research Group. Contents. Cache Logical View Physical View Case Study – Power 4 L2 Cache. Logical Cache Structure. n-way associative cache. n-elements per set. 2 m Sets. …. m. 32 – m - k. =. =. k. Tag. Index. Offset. or.
E N D
Cache Physical Implementation Panayiotis Charalambous Xi Research Group
Contents • Cache Logical View • Physical View • Case Study – Power 4 L2 Cache
Logical Cache Structure n-way associative cache n-elements per set 2m Sets … m 32 – m - k = = k Tag Index Offset or Address (32 bits) Hit Data
Cache Access • Steps • Decode address • Enable the word line • Raise the bit lines to high • Get the tag value from the tag array • Check for tag match • Select data output
Conventional Cache Organization Memory Cell
Memory Cell • Read: • Set bit and bit´ high • If the value in the cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged • Write: • Set bit´ to 0. This forces 1 in the latch.
Output Driver Maximum of one input bits high If input 0, then high resistant output Various Components Comparator is xor logic Multiplexer hierarchy for offset. First get block (from output drive), then word, then byte I0 I1 I7 …
Banking • Idea: Support Multiple Cache Accesses • Solution: • Use multiporting on bit cells (Cost is big) • Divide the cache into independent banks
Cache Search • Steps: • Find Bank (bank index) • Find Set in Bank (index) • Check if data is valid and in the cache (tag match) • If all ok return data (block and byte offset), else check lower level memory
Case Study - Power 4 • Dual Core 64-bit Processors • 32KB L1 D-Cache (Per Processor) • 2-way associative • 128 Bytes Line • 64KB L1 I-Cache (Per Processor) • Direct Mapped • 128 Bytes Line (4 sectors x 32B) • ~1.5MB L2 Cache • 8-way set associative • 128 Bytes line
Power4 L2 Logical View • Cache Split into 3 Parts, 0.5Mb each • Control by 4 Coherency Processors • 1 64B Store Queue per Processor
Power4 L2U • ~512 KB • 8 Banks • 128 B block size • 8-way associative Bit lines Word lines Address Bus Decoders
Power4 • L2 Cache Block Size C = 512 KB = 219 B • Block Size = 128 B = 27 B • 8-way associative • 8 Banks per Cache Block • Therefore: • Set Size is 23*27 B= 210 B • Sets in Cache are 219/210 =29 sets • Sets per Bank are 29 / 23 = 26 sets 9 7 64-bit tag index offset 6 3 bank index set index
cacti 524288 128 8 0.8um 8 ---------- CACTI version 3.2 ---------- Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): 12.3473 Cycle Time (wave pipelined) (ns): 4.97337 Total Power all Banks (nJ): 418.337 Total Power Without Routing (nJ): 198.563 Total Routing Power (nJ): 219.774 Maximum Bank Power (nJ): 63.5175 Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2 cacti 524288 128 8 0.8um 16 ---------- CACTI version 3.2 ---------- Cache Parameters: Number of Subbanks: 16 Total Cache Size: 524288 Size in bytes of Subbank: 32768 Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): 12.434 Cycle Time (wave pipelined) (ns): 4.85483 Total Power all Banks (nJ): 793.381 Total Power Without Routing (nJ): 341.424 Total Routing Power (nJ): 451.957 Maximum Bank Power (nJ): 63.1382 Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2 Power4: CACTI Results
CACTI • Data Array • Ndwl: World line split factor • Ndbl: Bit line split factor • Nspd: Number of sets mapped to a single word line (sectors) • Tag Array • Ntwl: World line split factor • Ntbl: Bit line split factor • Nspt: Number of sets mapped to a single word line (sectors) • Increase of Ndbl, Nspd, Ntbl, Nspt requires the increase of sense amplifiers • Increase of Ndwl and Ntwl increases the number of word line drivers