Another Performance Evaluation of Memory Hierarchy in Embedded Systems

Another Performance Evaluation of Memory Hierarchy in Embedded Systems Nelson Barnes CPE 631 04/14/03

Outline • Introduction • Related Work • Problem Statement • Proposed Solutions • Experimental Setup • Experimental Results • Conclusions UAH, ECE

Introduction Why is cache design so important in embedded systems? UAH, ECE

Cache Design Parameters • Cache organization • Unified vs. Split (Instruction + Data) caches • Cache size • Cache block (line) size • Block placement policy • Direct-mapped, fully-associative, set-associative • Block replacement policy • Random, Least-Recently Used (LRU), Round-robin, Pseudo-LRU, OPT (Optimal) UAH, ECE

Related Work Mibench vs. NetBench UAH, ECE

Problem Statement • Comprehensive performance evaluation of cache design issues in embedded systems • Split versus unified cache • Cache placement and size • Cache block size • Block replacement policy • Performance metrics • Static measure: the number of cache misses per 1K instructions executed - measured at the end of application execution • Dynamic measure: The number of cache misses per 1K instructions executed - measured on every 100K instructions executed UAH, ECE

Proposed Solution Why use NetBench? UAH, ECE

Experimental Setup • ARM version of the SimpleScalar toolset • Sim-cache • Sim-cheetah • NetBench Applications include: • Micro-Level Programs • CRC – Checksum calculation • TL – Table lookup • IP-Level Programs • Route – IPv4 routing • DRR – Deficit round robin • Application-Level Programs • DH – Public key encryption/decryption • MD5 – Message digest algorithm (secure signature) UAH, ECE

Experimental Setup • Cache memory setup • Split first level instruction and data • Unified first level cache • Cache parameters • Cache size  ranging from 0.5KB to 32KB • Cache associativity  direct mapped, 2-way, 4-way, and 8-way set associative • Cache replacement policies FIFO, Random, LRU, pLRUt, pLRUm, and Optimal • Cache block size  32B, 64B UAH, ECE

Experimental Setup (cont’d) Instructions ARM Core L1I $ Data L1D $ ARM Core L1U $ Instructions& Data UAH, ECE

MiBench Experimental Results

Data Cache Misses UAH, ECE

Instruction Cache Misses UAH, ECE

Unified Cache Misses UAH, ECE

Dynamic Behavior UAH, ECE

Replacement Policies UAH, ECE

Experimental Results NetBench Discussion UAH, ECE

Conclusions • Split caches outperform the equivalent unified cache for relatively small direct mapped caches • Unified cache almost always outperforms the split caches for set-associative caches UAH, ECE

Conclusions • Increasing cache associativity reduces the number of cache misses (up to 8-way associative caches) • more beneficial for data and unified cachesthan for instruction caches • Pseudo-LRU techniques perform as well as LRU for data caches • Random performs the best for instruction caches • Relatively significant difference between optimal replacement policy and the best non-optimal policy UAH, ECE

Another Performance Evaluation of Memory Hierarchy in Embedded Systems

Another Performance Evaluation of Memory Hierarchy in Embedded Systems

Presentation Transcript

Quantitative Evaluation of Embedded Systems

Quantitative Evaluation of Embedded Systems

Embedded Computer Architecture Memory Hierarchy: Cache Recap

Memory Hierarchy—Improving Performance

Memory Hierarchy

Quantitative Evaluation of Embedded Systems

Memory Hierarchy

Memory Hierarchy

Quantitative Evaluation of Embedded Systems

Memory Hierarchy

Memory Hierarchy

Memory hierarchy

Memory Hierarchy

Performance Analysis of Embedded Systems

Quantitative Evaluation of Embedded Systems

Quantitative Evaluation of Embedded Systems

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Lecture 08: Memory Hierarchy Cache Performance