240 likes | 435 Views
Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design. Tony D. Givargis & Frank Vahid Department of Computer Science University of California Riverside, CA 92521 {givargis,vahid}@cs.ucr.edu. Jörg Henkel C&C Research Laboratories, NEC USA
E N D
Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design Tony D. Givargis & Frank Vahid Department of Computer Science University of California Riverside, CA 92521 {givargis,vahid}@cs.ucr.edu Jörg Henkel C&C Research Laboratories, NEC USA 4 Independence Way, Princeton, NJ 08540 henkel@ccrl.nj.nec.com A DAC scholarship and a NSF grant in part supported this research. University of California, Riverside & NEC USA
Introduction • Systems-on-a-chip (SOC) era • increased chip capacity • parametrizable core based system design • Large power/performance tradeoffs possible just by varying bus/cache parameter values [givargis99] • But, simulation based cache/bus power evaluation is slow University of California, Riverside & NEC USA
Introduction • We present a two-step approach for fast cache power evaluation • collect intermediate data using simulation • use equations to rapidly predict power • couple with a fast bus estimation approach • Our approach is • orders of magnitude faster than simulation • yields good accuracy University of California, Riverside & NEC USA
Bus A Bus B I-Cache CPU D-Cache Memory Bridge Peripheral Bus Peripheral 1 Peripheral 2 Peripheral n Target Architecture University of California, Riverside & NEC USA
I-Cache Bus A Bus B CPU D-Cache Memory Bridge Peripheral Bus Peripheral 1 Peripheral 2 Peripheral n Focus on Cache/Bus Parameters Power dissipation breakdown in a Digital Camera example University of California, Riverside & NEC USA
Bus A Bus B I-Cache CPU D-Cache Memory Bridge Peripheral Bus Peripheral 1 Peripheral 2 Peripheral n Cache Parameters University of California, Riverside & NEC USA
Tag Index Offset V T D V T D == == Mux Data Cache Parameters • Line Size • Associativity • Cache Size University of California, Riverside & NEC USA
Bus A Bus B I-Cache CPU D-Cache Memory Bridge Peripheral Bus Peripheral 1 Peripheral 2 Peripheral n Bus Parameters University of California, Riverside & NEC USA
Bus A/B Mux Demux Mux Demux C1 Bus A/B Mux Demux Mux Demux C2 C1 < C2 Bus Parameters Change Bus Width [givargis98] University of California, Riverside & NEC USA
Bus A/B Encoder Decoder Encoder Decoder Bus A/B Encoder Decoder Encoder Decoder invert_ctr Bus Parameters Change Data Representation (Bus Invert) [Stan95] Reduce Bus Switching University of California, Riverside & NEC USA
0 1 0 0 1 0 1 1 0 1 1 0 1 0 0 1 inverted_ctr 0 1 Bus Parameters Binary Encoding Bus-Invert Encoding 0 1 0 0 1 0 1 1 1 0 0 1 0 1 1 0 Hamming Dist = 6 Hamming Dist = 3 University of California, Riverside & NEC USA
Related Work • Important to explore various cache and bus parameters for best performance and power [Wilton96][Li98][givargis99] • large number of cache/bus configurations • need to estimate power/performance in constant time • Trace stripping [Wolf99], configuration ordering, single pass simulation [Kirovski]) University of California, Riverside & NEC USA
# of misses (N) } } } Size (S) Approach Overview • Given a trace of memory refs • Cache parameters • Size (S) • Line/block-size (L) • Associativity (A) • Compute # of misses (N) University of California, Riverside & NEC USA
Approach Overview • Capture improvements obtainable by: • changing line-size at small/large values of cache-size • changing associativity at small/large values of cache-size University of California, Riverside & NEC USA
Approach Overview • Bus equation: • m items/second (denotes the traffic N on the bus) • n bits/item • k bit wide bus • binary encoding • random data assuption University of California, Riverside & NEC USA
Approach Overview • Bus equation: • m items/second (denotes the traffic N on the bus) • n bits/item • k bit wide bus • bus-invert encoding • random data assumption University of California, Riverside & NEC USA
I-Cache Bus A Bus B CPU D-Cache Memory Bridge Peripheral Bus Peripheral 1 Peripheral 2 Peripheral n Experiments • Cache parameters • size: 128, 256, 512, 1k, • 2k, 4k, 8k, 16k, 32k • assoc: 2, 4, 8 • line: 8, 16, 32 • Bus Parameters • width: 4, 8, 16, 32 • code: binary/bus-invert • Analyzed 45K sets exhaust. • 3d-Image • CKey • MPEG • Diesel • 5kB to 230kB of C code University of California, Riverside & NEC USA
Performance ISS CPU Power Power Memory Power Trace Generator Cache Simulator + Bus Simulator I/D Cache Power Experiment Setup C Program • Dinero [Edler, Hill] • CPU power [Tiwari96] University of California, Riverside & NEC USA
Experiment Results • Diesel application’s performance • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 4% error 320x faster University of California, Riverside & NEC USA
Experiment Results • Diesel application’s energy consumption • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 2% error 420x faster University of California, Riverside & NEC USA
Experiment Results • CKey application’s performance • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 8% error 125x faster University of California, Riverside & NEC USA
Experiment Results • CKey application’s energy consumption • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 3 % error 125x faster University of California, Riverside & NEC USA
Time (hours) Power Error (%) Experiment Results • 125 - 400x speedup • 1-18% absolute error (power & performance) • 2% average power error University of California, Riverside & NEC USA
Conclusion • Presented a technique for rapidly estimating the power and performance of cache and bus sub-systems • orders of magnitude faster than exhaustive simulation • yields good accuracy • Enable exploration of parameters in parameterized system-on-a-chip architecture University of California, Riverside & NEC USA