Computer Systems Principles Architecture

Computer Systems PrinciplesArchitecture Emery Berger and Mark Corner University of Massachusetts Amherst

Architecture

Von Neumann

“von Neumann architecture”

Fetch, Decode, Execute

The Memory Hierarchy • Registers • Caches • Associativity • Misses • “Locality” registers L1 L2 RAM

Registers • Register = dedicated name for word of memory managed by CPU • General-purpose: “AX”, “BX”, “CX” on x86 • Special-purpose: • “SP” = stack pointer • “FP” = frame pointer • “PC” = program counter SP arg0 arg1 arg0 arg1 arg2 FP

Registers • Register = dedicated name for one word of memory managed by CPU • General-purpose: “AX”, “BX”, “CX” on x86 • Special-purpose: • “SP” = stack pointer • “FP” = frame pointer • “PC” = program counter • Change processes:save current registers &load saved registers =context switch SP arg0 arg1 FP

Caches • Access to main memory: “expensive” • ~ 100 cycles (slow, but relatively cheap ($)) • Caches: small, fast, expensive memory • Hold recently-accessed data (D$) or instructions (I$) • Different sizes & locations • Level 1 (L1) – on-chip, smallish • Level 2 (L2) – on or next to chip, larger • Level 3 (L3) – pretty large, on bus • Manages lines of memory (32-128 bytes)

Memory Hierarchy • Higher = small, fast, more $, lower latency • Lower = large, slow, less $, higher latency registers 1-cycle latency 2-cycle latency L1 evict D$, I$ separate load 7-cycle latency L2 D$, I$ unified RAM 100 cycle latency Disk 40,000,000 cycle latency Network 200,000,000+ cycle latency

“Locality”

“Level 0 Cache”

“Level 1 Cache”

“RAM”

“Disk”

“Book Hierarchy”

Orders of Magnitude • 10^0 registers L1

Orders of Magnitude • 10^1 L2

Orders of Magnitude • 10^2 RAM

Orders of Magnitude • 10^3

Orders of Magnitude • 10^7 Disk

Orders of Magnitude • 10^8 Network

Orders of Magnitude • 10^9 Network

Cache Jargon • Cache initially cold • Accessing data initially misses • Fetch from lower level in hierarchy • Bring line into cache (populate cache) • Next access: hit • Warmed up • cache holds most-frequently used data • Context switch implications? • LRU: Least Recently Used • Use the past as a predictor of the future

Cache Details • Ideal cache would be fully associative • That is, LRU (least-recently used) queue • Generally too expensive • Instead, partition memory addresses and put into separate bins divided into ways • 1-way or direct-mapped • 2-way = 2 entries per bin • 4-way = 4 entries per bin, etc.

Associativity Example • Hash memory based on addresses to different indices in cache

Miss Classification • First access = compulsory miss • Unavoidable without prefetching • Too many items in way = conflict miss • Avoidable if we had higher associativity • No space in cache = capacity miss • Avoidable if cache were larger • Invalidated = coherence miss • Avoidable if cache were unshared

Quick Activity • Cache with 8 slots, 2-way associativity • Assume hash(x) = x % 4 (modulus) • How many misses? • # compulsory misses? • # conflict misses? • # capacity misses? 10 2 0

Locality • Locality = re-use of recently-used items • Temporal locality: re-use in time • Spatial locality: use of nearby items • In same cache line, same page (4K chunk) • Intuitively – greater locality = fewer misses • # misses depends on cache layout, # of levels, associativity… • Machine-specific

Quantifying Locality • Instead of counting misses,compute hit curve from LRU histogram • Assume perfect LRU cache • Ignore compulsory misses 7 3 1 2 3 4 5 6

Quantifying Locality • Instead of counting misses,compute hit curve from LRU histogram • Assume perfect LRU cache • Ignore compulsory misses 2 7 3 1 2 3 4 5 6

Quantifying Locality • Instead of counting misses,compute hit curve from LRU histogram • Assume perfect LRU cache • Ignore compulsory misses 3 2 7 1 2 3 4 5 6

Quantifying Locality • Instead of counting misses,compute hit curve from LRU histogram • Start with total misses on right hand side • Subtract histogram values 1 1 3 3 3 3 1 2 3 4 5 6

Quantifying Locality • Instead of counting misses,compute hit curve from LRU histogram • Start with total misses on right hand side • Subtract histogram values • Normalize .3 .3 1 1 1 1

Hit Curve Exercise • Derive hit curve for following trace:

1 2 2 2 3 3 4 5 6 Hit Curve Exercise • Derive hit curve for following trace: 1 2 3 4 5 6 7 8 9

1 2 2 2 3 3 4 5 6 1 2 3 4 5 6 7 8 9 Hit Curve Exercise • Derive hit curve for following trace:

What can we do with this? • What would be the hit rate • with a cache size of 4 or 9?

Simple cache simulator • Only argument is N, length of LRU queue • Read in addresses (ints) from cin • Output hits & misses to cout • queue<int> • push_front (v) = put v on front of queue • pop_back() = remove back from queue • erase(i) = erase element (iterator i) • size() = number of elements • for (queue<int>::iterator i = q.begin(); i < q.end(); ++i) cout << *i << endl;

Important CPU Internals • Other issues that affect performance • Pipelining • Branches & prediction • System calls (kernel crossings)

Scalar architecture • Straight-up sequential execution • Fetch instruction • Decode it • Execute it • Problem: I or D cache miss • Result – stall: everything stops • How long to wait for miss? • long time compared to CPU

Computer Systems Principles Architecture

Computer Systems Principles Architecture

Presentation Transcript

Computer Systems Architecture CMT603 Operating Systems

CS352H: Computer Systems Architecture

CS352H: Computer Systems Architecture

Computer Systems Principles Deadlock

Computer Architecture Principles Dr. Mike Frank

CS352H: Computer Systems Architecture

Genetic Computer School Computer Systems Architecture

Computer Architecture Principles Dr. Mike Frank

Computer Systems Principles Introduction

CS352H: Computer Systems Architecture

CS352H: Computer Systems Architecture

CS352H Computer Systems Architecture

Computer Systems Architecture

CS352H: Computer Systems Architecture

Computer Systems Principles Synchronization

CS352H: Computer Systems Architecture

Computer Systems Architecture

CS352H: Computer Systems Architecture

Computer Architecture Principles Dr. Mike Frank

COMP311A Computer Systems Architecture