200 likes | 217 Views
Another Performance Evaluation of Memory Hierarchy in Embedded Systems. Nelson Barnes CPE 631 04/14/03. Outline. Introduction Related Work Problem Statement Proposed Solutions Experimental Setup Experimental Results Conclusions. Introduction.
E N D
Another Performance Evaluation of Memory Hierarchy in Embedded Systems Nelson Barnes CPE 631 04/14/03
Outline • Introduction • Related Work • Problem Statement • Proposed Solutions • Experimental Setup • Experimental Results • Conclusions UAH, ECE
Introduction Why is cache design so important in embedded systems? UAH, ECE
Cache Design Parameters • Cache organization • Unified vs. Split (Instruction + Data) caches • Cache size • Cache block (line) size • Block placement policy • Direct-mapped, fully-associative, set-associative • Block replacement policy • Random, Least-Recently Used (LRU), Round-robin, Pseudo-LRU, OPT (Optimal) UAH, ECE
Related Work Mibench vs. NetBench UAH, ECE
Problem Statement • Comprehensive performance evaluation of cache design issues in embedded systems • Split versus unified cache • Cache placement and size • Cache block size • Block replacement policy • Performance metrics • Static measure: the number of cache misses per 1K instructions executed - measured at the end of application execution • Dynamic measure: The number of cache misses per 1K instructions executed - measured on every 100K instructions executed UAH, ECE
Proposed Solution Why use NetBench? UAH, ECE
Experimental Setup • ARM version of the SimpleScalar toolset • Sim-cache • Sim-cheetah • NetBench Applications include: • Micro-Level Programs • CRC – Checksum calculation • TL – Table lookup • IP-Level Programs • Route – IPv4 routing • DRR – Deficit round robin • Application-Level Programs • DH – Public key encryption/decryption • MD5 – Message digest algorithm (secure signature) UAH, ECE
Experimental Setup • Cache memory setup • Split first level instruction and data • Unified first level cache • Cache parameters • Cache size ranging from 0.5KB to 32KB • Cache associativity direct mapped, 2-way, 4-way, and 8-way set associative • Cache replacement policies FIFO, Random, LRU, pLRUt, pLRUm, and Optimal • Cache block size 32B, 64B UAH, ECE
Experimental Setup (cont’d) Instructions ARM Core L1I $ Data L1D $ ARM Core L1U $ Instructions& Data UAH, ECE
Data Cache Misses UAH, ECE
Instruction Cache Misses UAH, ECE
Unified Cache Misses UAH, ECE
Dynamic Behavior UAH, ECE
Dynamic Behavior UAH, ECE
Replacement Policies UAH, ECE
Experimental Results NetBench Discussion UAH, ECE
Conclusions • Split caches outperform the equivalent unified cache for relatively small direct mapped caches • Unified cache almost always outperforms the split caches for set-associative caches UAH, ECE
Conclusions • Increasing cache associativity reduces the number of cache misses (up to 8-way associative caches) • more beneficial for data and unified cachesthan for instruction caches • Pseudo-LRU techniques perform as well as LRU for data caches • Random performs the best for instruction caches • Relatively significant difference between optimal replacement policy and the best non-optimal policy UAH, ECE