1 / 19

Cache Physical Implementation

Cache Physical Implementation. Panayiotis Charalambous Xi Research Group. Contents. Cache Logical View Physical View Case Study – Power 4 L2 Cache. Logical Cache Structure. n-way associative cache. n-elements per set. 2 m Sets. …. m. 32 – m - k. =. =. k. Tag. Index. Offset. or.

Download Presentation

Cache Physical Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Physical Implementation Panayiotis Charalambous Xi Research Group

  2. Contents • Cache Logical View • Physical View • Case Study – Power 4 L2 Cache

  3. Logical Cache Structure n-way associative cache n-elements per set 2m Sets … m 32 – m - k = = k Tag Index Offset or Address (32 bits) Hit Data

  4. Cache Structure

  5. Cache Access • Steps • Decode address • Enable the word line • Raise the bit lines to high • Get the tag value from the tag array • Check for tag match • Select data output

  6. Conventional Cache Organization Memory Cell

  7. Memory Cell • Read: • Set bit and bit´ high • If the value in the cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged • Write: • Set bit´ to 0. This forces 1 in the latch.

  8. Decoder with Driver

  9. Output Driver Maximum of one input bits high If input 0, then high resistant output Various Components Comparator is xor logic Multiplexer hierarchy for offset. First get block (from output drive), then word, then byte I0 I1 I7 …

  10. Banking • Idea: Support Multiple Cache Accesses • Solution: • Use multiporting on bit cells (Cost is big) • Divide the cache into independent banks

  11. Cache Search • Steps: • Find Bank (bank index) • Find Set in Bank (index) • Check if data is valid and in the cache (tag match) • If all ok return data (block and byte offset), else check lower level memory

  12. Case Study - Power 4 • Dual Core 64-bit Processors • 32KB L1 D-Cache (Per Processor) • 2-way associative • 128 Bytes Line • 64KB L1 I-Cache (Per Processor) • Direct Mapped • 128 Bytes Line (4 sectors x 32B) • ~1.5MB L2 Cache • 8-way set associative • 128 Bytes line

  13. Power4 Floorplan

  14. Power4 L2 Logical View • Cache Split into 3 Parts, 0.5Mb each • Control by 4 Coherency Processors • 1 64B Store Queue per Processor

  15. Power4 L2U • ~512 KB • 8 Banks • 128 B block size • 8-way associative Bit lines Word lines Address Bus Decoders

  16. Power4 • L2 Cache Block Size C = 512 KB = 219 B • Block Size = 128 B = 27 B • 8-way associative • 8 Banks per Cache Block • Therefore: • Set Size is 23*27 B= 210 B • Sets in Cache are 219/210 =29 sets • Sets per Bank are 29 / 23 = 26 sets 9 7 64-bit tag index offset 6 3 bank index set index

  17. cacti 524288 128 8 0.8um 8 ---------- CACTI version 3.2 ---------- Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): 12.3473 Cycle Time (wave pipelined) (ns): 4.97337 Total Power all Banks (nJ): 418.337 Total Power Without Routing (nJ): 198.563 Total Routing Power (nJ): 219.774 Maximum Bank Power (nJ): 63.5175 Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2 cacti 524288 128 8 0.8um 16 ---------- CACTI version 3.2 ---------- Cache Parameters: Number of Subbanks: 16 Total Cache Size: 524288 Size in bytes of Subbank: 32768 Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): 12.434 Cycle Time (wave pipelined) (ns): 4.85483 Total Power all Banks (nJ): 793.381 Total Power Without Routing (nJ): 341.424 Total Routing Power (nJ): 451.957 Maximum Bank Power (nJ): 63.1382 Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2 Power4: CACTI Results

  18. CACTI • Data Array • Ndwl: World line split factor • Ndbl: Bit line split factor • Nspd: Number of sets mapped to a single word line (sectors) • Tag Array • Ntwl: World line split factor • Ntbl: Bit line split factor • Nspt: Number of sets mapped to a single word line (sectors) • Increase of Ndbl, Nspd, Ntbl, Nspt requires the increase of sense amplifiers • Increase of Ndwl and Ntwl increases the number of word line drivers

  19. Thank You

More Related