340 likes | 412 Views
TLC: Transmission Line Caches. Brad Beckmann David Wood Multifacet Project http://www.cs.wisc.edu/multifacet/ University of Wisconsin-Madison 12/3/03. Overview. Problem : Global interconnect Opportunity : On-chip transmission lines What are they? Why now?
E N D
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project http://www.cs.wisc.edu/multifacet/ University of Wisconsin-Madison 12/3/03
Overview • Problem: Global interconnect • Opportunity: On-chip transmission lines • What are they? • Why now? • Application: Large on-chip caches • Solution: TLC: Transmission Line Caches • Consistent high performance • Simple logical design • Less substrate area • Circuit verification • Wafer manufacturing cost MICRO ’03 - TLC: Transmission Line Caches
Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC: Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches
Global Interconnect Problem • Global interconnect latency →Bottleneck • RC delay dominant • Held constant using repeaters • Doesn’t scale with transistors • Large structures particularly hurt • Partitioning mitigates intra-partition delay • Performance dominated by inter-partition delay MICRO ’03 - TLC: Transmission Line Caches
Conventional Solution • ↑ wire size → ↓ RC delay • 3x size → 3x reduced delay • ↑ wire segment length • 3x channel area • Doesn’t scale • Intrinsic repeater delay • Inductive effects A Better Solution? MICRO ’03 - TLC: Transmission Line Caches
Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC - Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches
Voltage Voltage Vt Distance Driver Receiver Voltage Voltage Vt Distance Driver Receiver RC vs. TL Communication Conventional Global RC Wire On-chip Transmission Line MICRO ’03 - TLC: Transmission Line Caches
RC Wire vs. TL Design Conventional Global RC Wire ~0.375 mm RC delay dominated On-chip Transmission Line ~10 mm LC delay dominated Receiver Driver MICRO ’03 - TLC: Transmission Line Caches
On-chip Transmission Lines • Why now? → 2010 technology • Relative RC delay ↑ • Improve latency by 10x or more • What are their limitations? • Require thick wires and dielectric spacing • Increase wafer cost Presents a different Latency/Bandwidth Tradeoff MICRO ’03 - TLC: Transmission Line Caches
Latency Comparison MICRO ’03 - TLC: Transmission Line Caches
Bandwidth Comparison 2 transmission line signals 50 conventional signals • Key observation • Transmission lines – route over large structures • Conventional wires – substrate area & vias for repeaters MICRO ’03 - TLC: Transmission Line Caches
Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC: Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches
Texas Non-uniform Cache Architectures (NUCA) Bank Request 0x….3 Request 0x….C Cache Controller Switch SNUCA – statically partitions addresses across the banks MICRO ’03 - TLC: Transmission Line Caches
Texas DNUCA Solution • Issues with DNUCA • Locating cache blocks • Power consumed accessing distant banks • 15% of total area devoted to routing channels A B Frequently requested blocks migrate towards the cache controller MICRO ’03 - TLC: Transmission Line Caches
Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC - Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches
TL Drivers & Receivers TL link 2x8 bytes TLC Cache Controller TLC - Transmission Line Cache 512 KB Bank High bandwidth, low latency interface between the controller and banks MICRO ’03 - TLC: Transmission Line Caches
Transmission Lines Latches Multi- cycle delay Transmission Lines Transmission Line Transceivers Central Cache Controller Logic TLC Cache Controller Repeaters MICRO ’03 - TLC: Transmission Line Caches
Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC - Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches
Methodology • Assumptions • ITRS projection for 2010 • 45 nm technology • Low-k (2.1) intermetal dielectric • 10 GHz operational frequency • Physical Evaluation • Linpar RLC extractor • Hspice W element transmission line • Performance Evaluation • Full system simulation • Simics extended with an Out-of-Order processor and memory system timing models MICRO ’03 - TLC: Transmission Line Caches
Cache Characteristics • Exclusive write-back caches • 4 wide, 30 stage pipeline, OoO processor • 300 cycle memory latency MICRO ’03 - TLC: Transmission Line Caches
Performance SpecINT SpecFP Commercial MICRO ’03 - TLC: Transmission Line Caches
Substrate Area * 18% reduction • On-chip transmission lines allow direct routing from the driver to receiver without repeaters • Facilitates compact layout • Devotes less substrate area to the routing channels MICRO ’03 - TLC: Transmission Line Caches
Link Utilization MICRO ’03 - TLC: Transmission Line Caches
Optimized TLC Designs • Utilize fewer transmission lines • Base design: requires 2k transmission lines • Opt designs: require 1k, 500, & 350 • Reduce manufacturing cost • Increase logic complexity MICRO ’03 - TLC: Transmission Line Caches
Link Utilization (TLC Family) MICRO ’03 - TLC: Transmission Line Caches
Performance (TLC Family) MICRO ’03 - TLC: Transmission Line Caches
Conclusions 1 • Transmission lines offer a different latency/bandwidth tradeoff • Advantages • Lower latency for global links • Direct routing over large structures • Limitations • Large, sparsely populated, metal layers • Greater circuit verification effort MICRO ’03 - TLC: Transmission Line Caches
Conclusions 2 • Possible application: TLC • Advantages • Consistent high performance • Simpler logical design • 18% less substrate area • Less power in the communication network • Disadvantages • Circuit verification • Wafer cost MICRO ’03 - TLC: Transmission Line Caches
Other Applications? MICRO ’03 - TLC: Transmission Line Caches
Optimized TLC Designs • TLCopt 1000 • Blocks are partitioned across 2 banks • Each transmission line link is 126 bits wide • 1008 total data TLs TL link 2x64 bits TL link 2x44 bits TL link 2x126 bits • TLCopt 500 • Blocks are partitioned across 4 banks • Each transmission line link is 64 bits wide • 512 total data TLs 1 MB Bank • TLCopt 350 • Blocks are partitioned across 8 banks • Each transmission line link is 44 bits wide • 352 total data TLs TLCopt Cache Controller MICRO ’03 - TLC: Transmission Line Caches
Equake Performance MICRO ’03 - TLC: Transmission Line Caches
Additional Transceiver Delay MICRO ’03 - TLC: Transmission Line Caches