300 likes | 613 Views
Computer Architecture Project #2 Cache Simulator. Objectives. To understand cache memory Organization Set associativity Operation Cache Read & Write, Hit & Miss LRU replacement policy Performance Hit/miss ratio, miss penalty To develop your own cache simulator. Memory Access Pattern.
Objectives • To understand cache memory • Organization • Set associativity • Operation • Cache Read & Write, Hit & Miss • LRU replacement policy • Performance • Hit/miss ratio, miss penalty • To develop your own cache simulator Memory Access Pattern Cache Organization Display Option Cache Simulator Hit/Miss Performance
General Cache Organization (S, E, B) E = 2e lines per set set line S = 2s sets If e = 1, “Direct Mapped Cache” else If s = 1, “Fully Associative Cache” else “E-Way Set Associative Cache” Cache size: C = S x E x B data bytes tag 0 1 2 B-1 v valid bit B = 2b bytes per cache block (the data)
E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits 0…01 • 100 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 find set v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7
E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits 0…01 • 100 compare both valid? + match: yes = hit v tag tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 block offset
E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits 0…01 • 100 compare both valid? + match: yes = hit v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 block offset short int (2 Bytes) is here • No match : • One line in set is selected for eviction and replacement • Replacement policies: random, least recently used (LRU), …
LRU Replacement Policy • Theoretically… • Practically…
Performance • (Average Access Time) = (Hit Time) + (Miss Rate) × (Miss Penalty) = (Hit Time) + [1 – (Hit Rate)] × (Miss Penalty) • Example • Suppose cache hit time is 1 cycle, • Miss penalty is 100 cycles, • and hit rate is 97%. • Then average access time is:1 cycle + ( 1 – 0.97 ) × 100 cycles = 1 + 0.03 × 100 = 4 cycles.
Requirements of the cache simulator (1) • Cache simulator (hereinafter referred to CSIM) shall implementarbitrary numbers of sets and lines, and block size. • You should implement a way to provide the numbers of sets and lines, andblock size as inputs to CSIM. • CSIM shall a read trace file line by line and process it. • You should determine whether each memory operation is a cache hit or miss. • You should implement the LRU replacement policy • CSIM shall report the result of cache simulation. • You should report these three basic results: numbers of Hits, misses, and evicts • You should be able to report the average access time of cache simulation • You should be able to report whether each memory access in trace file results in a cache hit or miss
Restrictions & Advices • Implement method for input parameters. • You should implement it by argument passing. (full credit) • If you can’t, you can use standard input such as scanf(). (low credit) • Evaluate only data cache performance. • Therefore, you should ignore instruction load. • You should assume that the memory accesses are aligned properly.Therefore, you can ignore requested size in trace file. • You should evaluate your CSIM with, at least, 3 different trace data. You canuse one provided with this project. • Calculate average access time using below assumption: • Hit time = 1 cycle, miss penalty = 100 cycles. • Compile your CSIM without warnings.
How to trace memory accesses • “valgrind” • GPL licensed programming tool for memory debugging, memory leak detection, and profiling. (from http://en.wikipedia.org/wiki/Valgrind) • Usage: >> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l • Valgrind prints out memory accesses of “ls -l” on stdout, so you need to capture it by:>> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l > ls.trace • Output Format: [space]operation address,size
Reference Cache Simulator set • Usage: >>./csim [-v] -s <s> -E <E> -b <b> -t <trace file> • -v: Optional verbose flag that displays trace info • -s <s>: Number of set index bits (S = 2s is the number of sets) • -E <E>: Associativity (number of lines per set) • -b <b>: Number of block bits (B = 2b is the block size) • -t <trace file>: Name of the valgrind trace to replay line S = 2s sets Cache size: C = S x E x B data bytes tag 0 1 2 B-1 v valid bit B = 2b bytes per cache block (the data)
Cache Simulation Example (1) • Usage: >>./csim [-v] -s <s> -E <E> -b <b> -t <trace file> • Example: >>./csim -v -s 4 -E 1 -b 4 -t ./traces/yi.trace • Number of set index bits = 4 (16 sets) • Associativity = 1 (Direct Mapped Cache) • Number of block bits = 4 (16 blocks in a cache line) • Output L 10,1 miss M 20,1 miss hit …. hits: 4 misses:5 eviction: 3
Cache Simulation Example (2) • Example memory access pattern
Cache Simulation Example (9) Average Access Time = 1 + (5 / 9) * 100 = 56.5 Cycle
보고서 작성요령 (1) • 아래의 내용을 포함할 것 • 설계 요구사항 • 제시된 CSIM의 설계 요구사항을 자신의 CSIM에 맞춰재정의 • 구현 • 자신의 CSIM이 어떤 식으로 동작하며 어떻게 설계 요구사항을반영하는지 서술 • 자신의 CSIM의 사용법과 시뮬레이션 결과 출력 방법에 대해 서술 • 시험 • CSIM의 요구사항을 어떤 방법으로 검증하였는지 서술 • 최소 3가지 Trace Data를 이용하여 검증 수행추가적으로, Trace Data를 어떤 방법으로 얻었는지를 서술 • CSIM 구현 내용을 알 수 있도록 캡쳐된 이미지를 첨부할 것 설계 시험 구현
보고서 작성요령 (2) • 아래의 내용을 포함할 것 • 성능 평가 • 각각의 Cache 구조 (direct mapped, E-way setassociative 및 fully associative cache)별로 성능을 측정하고 각각을비교할 것 Design Testing Coding
제출방법 • 아래 제출 목록의 산출물들을 메일로 제출 • E-mail address: yonghunlee@archi.snu.ac.kr • E-mail 제목: “[CSIM]학번_이름” • 산출물들은 “학번_이름.zip” 또는 “학번_이름.tar”으로 압축하여 제출 • 제출 목록 • CSIM source code • Project 보고서 • CSIM의 검증 시 사용한 Trace file • 제출 기한: ’13. 12. 18(수) 23:59 까지