1 / 23

Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power

Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power. Steve Dropsho , Alper Buyuktosunoglu , Rajeev Balasubramonian , David H. Albonesi , Sandhya Dwarkadas , Greg Semeraro , Grigorios Magklis , and Michael Scott ECE and CS Departments University of Rochester.

trilby
Download Presentation

Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi, Sandhya Dwarkadas, Greg Semeraro, Grigorios Magklis, and Michael Scott ECE and CS Departments University of Rochester

  2. Why Adaptive Structures? • General purpose uP are “one size fits all” • But, needs vary across (within) applications • Can save considerable energy by matching resources to the application Objective: Less energy for same performance by adapting storage structures to application

  3. Related Work • Adaptable cache • Balasubramonian et al., MICRO 2000 • Dhodapkar and Smith, ISCA 2002 • Adaptable issue logic • Buyuktosunoglu et al., GLS VLSI 2001 • Folegnani and Gonzalez, ISCA 2000

  4. Common Themes • A single adaptive structure • Use of global information for feedback • Exploration-based (caches)

  5. Related Work (cont) • Adaptable IQ, LSQ, and ROB • Ponomarev et al., MICRO 2001 • Three (3) adaptable structures • Reconfigurations based on local state

  6. Integrating Multiple Adaptive Structures IPREG Integer Int FUs IIQ Memory L2 Unified Cache L1 Icache Branch predict Rename map ROB L1 Dcache FetchQ LSQ Floating Pt FP FUs FPQ FPREG

  7. Challenges • Multiple (9) adaptive structures creates state explosion problem • Use of global information makes assigning cause and effect difficult • Potential for additive performance effects among the structures

  8. Approach: Local Management • Local information for configuration decisions • Tight control over performance variance

  9. Part I: The Caches IPREG Integer Int FUs IIQ Memory L2 Unified Cache L1 Icache Branch predict Rename map ROB L1 Dcache FetchQ LSQ Floating Pt FP FUs FPQ FPREG

  10. 0 1 2 3 A access (primary) B access (secondary) The Accounting Cache A1 B3 A2 B2 0 1 2 3 • Sequential accesses, A then B • Save energy on A access hit • Swap blocks on A access miss A3 B1 0 1 2 3 Swap 0 1 2 3 A4 B0 0 1 2 3

  11. Way 1 2 3 4 Line A B C D 0 0 0 1 1 1 2 2 2 3 3 3 1 0 2 3 Most-Recently-Used Statistics MRU State Counters MRU[0] 3 A MRU[1] 2 B 0 1 2 3 MRU[2] 1 MRU State Transitions B A 1 0 2 3 MRU[3] 0 A Misses 0 C 1 2 0 3

  12. Configuration Evaluation (mru) (lru) MRU[0] MRU[1] MRU[2] MRU[3] Misses 3 2 1 0 0 Delay = 6 DA + 3 DB Energy = 6 E1 + 3 E3 Delay = 6 DA + 1 DB Energy = 7 E2 Energy = 6 E3 Delay = 6 DA Energy = 6 E4 BASE Delay = 6 DA

  13. Tolerance and the Bank Account • Tolerance allows more delay than BASE • DTOL = DBASE (1 + TOL) • TOL = {0.015, 0.062, 0.25} (1/64, 1/16, 1/4) • Bank account allows accumulation of unused tolerance • Use account credits in later intervals • Allows aggressive resizing • Amortizes mistakes over many intervals

  14. Memory Hierarchy L2 Unified Cache (A/B) One Possible Configuration 0 1 2 3 L1 D-Cache (A, no B) L1 I-Cache (A/B) 0 1 2 3 0 1 2 3

  15. Environment • Simplescalar simulator • Microarchitecture is similar to Alpha 21264 • Benchmarks are a mix of SPEC95, SPEC2K, and Olden • Energy models for buffers and caches from Buyuktosunoglu et al., GLS VLSI 2001 and Balasubramonian et al., MICRO 2000

  16. Cache Results

  17. Part II: Queues, Regs, and ROB IPREG Integer Int FUs IIQ Memory L2 Unified Cache L1 Icache Branch predict Rename map ROB L1 Dcache FetchQ LSQ Floating Pt FP FUs FPQ FPREG

  18. Resizable Queues/Reg File Buffer PN N partitions of m elements m P1

  19. Distribution of Buffer Size Buffer Sizing With Limited Histogramming 0 Full Grow buffer • 8K cycle period • Tolerances: • 1.5% (1/64) • 6.2% (1/16) • 25.0% (1/4) ave 0 Full Proper size ave 0 Full Precise shrink

  20. Resizing the Register File • Issue: Do not know when registers expire • Solution: To make reg file smaller, move values out of partition (P) to be turned off • First, inhibit new assignments to P • Next, use a software interrupt routine to move values via normal rename logic mov r1 r1 • Register mappings automatically updated

  21. Floating Point App Results

  22. Summary Results

  23. Conclusion • Simultaneous adaptation of all major regular structures • Accounting cache • Limited histogramming for buffers • Adaptable register file • Local control yet tolerable performance loss • Future work • Augment local control with global control for bounded performance loss

More Related