190 likes | 321 Views
Lecture 25: Wrap-Up. Mid-term-II stats: High 91 Mean 73.12 Qs 1-3: half the class got 25/25 Qs 4: only one student got 25/25; almost no one mentioned that we’ll need a mechanism to determine exclusivity Qs 5: highest was 22/30; very few mentioned that allowing
E N D
Lecture 25: Wrap-Up • Mid-term-II stats: • High 91 • Mean 73.12 • Qs 1-3: half the class got 25/25 • Qs 4: only one student got 25/25; almost no one mentioned • that we’ll need a mechanism to determine exclusivity • Qs 5: highest was 22/30; very few mentioned that allowing • blocks to move would complicate search
CPU 2 CPU 3 L1D L1I L1D L1I CPU 4 L1I L1D L1D CPU 1 L1I CPU 5 L1I L1D L1D CPU 0 L1I CPU 7 L1I L1D L1I CPU 6 L1D
Non-Uniform Cache Access (NUCA) • Many open problems in NUCA and D-NUCA • How should search happen? • Allocation/replacement/migration policies • Managing bandwidth/latency on the network • Prefetch mechanisms • Selective replication of blocks • Efficient write-throughs • Power/performance trade-offs • P.S. We have simulators, etc., to help model such • caches in case anyone is interested
Shameless Plug • CS 7810: Advanced Architecture • Lectures based on seminal (and still relevant) papers • Not much work, apart from class project (in teams) • Class project can involve as little as 1 week’s worth of • concentrated effort… • … or, enough to get a paper out of it • you WILL work on novel problems • lots of help from me/other students with the simulator
3-D • Imagine a similar problem in 3D P C P C C P C P P C P C C P C P P C P C C P C P
3-D • Imagine a similar problem in 3D P C P C C P C P P C P C C P C P P C P C C P C P Must schedule threads to manage temperature
Single Thread Performance • To improve single-thread performance, can even schedule • a single thread’s instructions across cores – large window • of in-flight instructions to mine high ILP – requires high • levels of speculation (power-hungry!) – any solutions? P C P C C P C P P C P C C P C P P C P C C P C P
Heterogeneous CMPs (Alpha EVx and Cell) in-o o-o-o o-o-o
NASCAR Applied to CPUs !?! • Bullet Source: Eric Rotenberg (NCSU)
Runahead Execution Single thread in a baseline architecture Single thread executing in tandem with a helper thread
Reliability For power For performance P1 C2 P2 C1 SMT core 1 SMT core 2
Title • Bullet