190 likes | 274 Views
A Highly Configurable Cache Architecture for Embedded Systems. Chuanjun Zhang, Frank Vahid and Walid Najjar University of California, Riverside ISCA 2003 Presenter: Jianwei Dai. Outline. Introduction background Way Concatenation for Dynamic Power Reduction
E N D
A Highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang, Frank Vahid and Walid Najjar University of California, Riverside ISCA 2003 Presenter: Jianwei Dai
Outline • Introduction • background • Way Concatenation for Dynamic Power Reduction • Adding Way Shutdown for Static Power Reduction • Application • Conclusion
Introduction • Caches consume up to 50% of a microprocessor’s energy • Important observations 1. lower associativity of Cache results in lower power consumption. Example: A direct mapped cache is more energy efficient per access, consuming only about 30% the energy of a same sized four-way set associative cache. 2. In some cases, not all cache’s capacity is required. How to explore these features to reduce the power consumption consumed by the Caches?? 1. Way concatenation 2. Way Shutdown
Background 1. Energy Consumed by Caches Simplification: k_miss_energy: ratio between energy_miss and energy_hit 50~200; k_static: percentage of total energy, 30%~50%
Background(Continued) • The Impact of Cache Associativity Tuning the associativity to a particular application is extremely important to minimize energy. For General Purpose uP: It is hard to implement due to the wide range of applications they have to support. For Embedded uP: The applications executed are well defined. Thus, it is easy to realize this approach.
Background (Continued) • Base Cache 8Kbytes, 4-way associativity, 32 bytes line size
Way Concatenation for Dynamic Power Reduction • A way concatenable four-way set associativity cache architecture
Way Concatenation for Dynamic Power Reduction (Continued) • How it works reg1reg2=00 a11a22 = 00 reg1reg2=00 a11a22 =11 reg1reg2=11 a11a22 = 00
Way Concatenation for Dynamic Power Reduction (continued) • Time and Area overhead 1. Negligible impact on timing performance. (1) the configuration circuit is not on the critical path (2) by resizing the configuration circuit, we can hide the its operation time since the circuit executes concurrently with index decoding (3) increase the size of NAND gate to speed up NAND gates in the critical path 2. Area overhead: 1% more
Way Concatenation for Dynamic Power Reduction • A way concatenable four-way set associativity cache architecture
Way Concatenation for Dynamic Power Reduction (continued) • Time and Area overhead 1. Negligible impact on timing performance. (1) the configuration circuit is not on the critical path (2) by resizing the configuration circuit, we can hide the its operation time since the circuit executes concurrently with index decoding (3) increase the size of NAND gate to speed up NAND gates in the critical path 2. Area overhead: 1% more
Way Concatenation for Dynamic Power Reduction • A way concatenable four-way set associativity cache architecture
Way Concatenation for Dynamic Power Reduction (continued) Simulation and Results for benchmark g3fax I8KD8KI4D4: an instruction cache with 8 Kbytes active (I8K), a data cache with 8 Kbytes active(D8K), with the instruction cache configured to be 4-way set associative (I4) and the data cache configured to be 4-way set associative (D4) First group: configurable cache with way concatenation Second group: configurable cache with shutdown Third group: conventional four-way and directed mapped cache
Way Concatenation for Dynamic Power Reduction (continued) Simulation and Results Two main observations: • Way Concatenation cache results in better performance compared to non-configurable direct mapped cache in many cases • Way Concatenation cache is better than way shutdown for reducing dynamic power consumption
Adding Way Shutdown For static Energy Reduction • Motivation for some applications, way shutdown has negligible impact on the performance.
Adding Way Shutdown For static Energy Reduction ( continued) • Results • Penalties: Area overhead: 5% more Performance overhead: 8% off
Using a Configurable Cache • How to determine the configuration for a specific application Based on the simulation or actual executions on the platform, the designer might need to modify the boot or reset part of the program. k_static = 30% k_miss_energy = 50
Conclusion • Introduce a novel configurable cache design method called way concatenation • Way concatenation based cache is very efficient in saving energy saving 37% power consumption when dynamic power consumption is considered compared to conventional four-way set associative cache; saving 40% power consumption when both dynamic and static power consumption are considered. • Impose little area overhead.