Lecture 26 Emerging Architectures

CS 15-447: Computer Architecture Lecture 26Emerging Architectures November 19, 2007 Nael Abu-Ghazaleh naelag@cmu.edu http://www.qatar.cmu.edu/~msakr/15447-f08

Last Time: Buses and I/O • Buses: Bunch of wires • Shared Interconnect: multiple “devices” connect to the same bus • Versatile: new devices can connect (even ones we didn’t know existed when bus was designed) • Can become a bottleneck • Shorter->faster; less devices->faster • Have to: • Define the protocol to make devices communicate • Come up with an arbitration mechanism Control Lines Data Lines

Bus Adaptor Bus Adaptor Types of Buses Processor Memory Bus • System bus • Connects processor and memory • Short, fast, synchronous, design specific • I/O Bus • Usually is lengthy and slower; industry standard • Need to match a wide range of I/O devices • Connects to the processor-memory bus or backplane bus Processor Memory Bus Adaptor I/O Bus Backplane Bus I/O Bus

Bus “Mechanics” • Master Slave • Have to define how we hand-shake • Depends on whether its synchronous or not • Bus arbitration protocol • Contention vs. reservation; centralized vs. distributed • I/O Model • Programmed I/O; Interrupt driven I/O; DMA • Increasing performance (mainly bandwidth) • Shorter; closer; wider • Block transfers (instead of byte transfers) • Split transaction buses • …

Today—Emerging Architectures • We are at an interesting point in computer architecture evolution • What is emerging and why is it emerging?

Uniprocessor Performance (SPECint) 3X From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, Sept. 15, 2006 ??%/year  Sea change in chip design—what is emerging? • VAX : 25%/year 1978 to 1986 • RISC + x86: 52%/year 1986 to 2002 • RISC + x86: ??%/year 2002 to present

How did we get there? • First, what allowed the ridiculous 52% improvement per year to continue for around 20 years? • If cars improved as much we would have 1 million Km/hr cars! • Is it just the number of transistors/clock rate? • No! Its also all the stuff that we’ve been learning about!

Walk down memory lane • What was the first processor organization we looked at? • Single cycle processors • How did multi-cycle processors improve those? • What did we do after that to improve performance? • Pipelining; why does that help? What are the limitations? • From there we discussed superscalar architectures • Out of order execution; multiple ALUs • This is basically state of the art in uniprocessors • What gave us problems there?

Detour: couple of other design points • Very Large Instruction Word Architectures; let the compiler do the work • Great for energy efficiency—less Instruction Level Parallelism • Not binary compatible? Trasnmeta Crusoe Processor

SIMD ISA Extensions—Parallelism from the Data? • Same Instruction applied to multiple Data at the same time • How can this help? • MMX (Intel) and 3DNow! (AMD) ISA extensions • Great for graphics; originally invented for scientific codes (vector processors) • Not a general solution • End of detour!

Back to Moore’s law • Why are the “good times” over? • Three walls • “Instruction Level Parallelism” (ILP) Wall • Less parallelism available in programs (2->4->8->16) • Tremendous increase in complexity to get more • Does VLIW help? • What can help? • Conclusion: standard architectures cannot continue to do their part of sustaining Moore’s law

Wall 2: Memory Wall 1000 µProc 52%/yr. (2X/1.5yr) • What did we do to help this? • Still very very expensive to access memory • How do we see the impact in practice? • Very different from when I learned architecture! “Moore’s Law” 100 Processor-Memory Performance Gap:(grows 50% / year) CPU 10 DRAM 9%/yr. (2X/10 yrs) Performance DRAM 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Ways out? Multithreaded Processors • Can we switch to other threads if we need to access memory? • When do we need to access memory? • What support is needed? • Can I use it to help with the ILP wall as well?

Symmetric Multithreaded Processors • How do I switch between threads? • Hardware support for that • How does this help? • But, increased contention for everything (BW, TLB, caches…)

Third Wall: Physics/Power wall • We’re down to the level of playing with a few atoms • More error prone; lower yield • But also soft-errors and wear out • Logic that sometimes works! • Can we do something in architecture to recover?

Power! Our topic next class

So, what is our way out? Any ideas? • Maybe architecture becomes commodity; this is the best we can do • This happens to a lot of technologies: why don’t we have the million km/hr car? • Do we actually need more processing power? • 8 bit embedded processors good enough for calculators; 4 bit ones probably good enough for elevators • Is there any sense to continue investing so much time and energy into this stuff? Power Wall + Memory Wall + ILP Wall = Brick Wall

A lifeline? Multi-core architectures • How does this help? • Think of the three walls • The new Moore’s law: • the number of cores will double every 3 years! • Many-core architectures

Overcoming the three walls • ILP Wall? • Don’t need to restrict myself to a single thread • Natural parallelism available across threads/programs • Memory wall? • Hmm, that is a tough one; on the surface, seems like we made it worse • Maybe help coming from industry • Physics/power wall? • Use less aggressive core technology • Simpler processors, shallower pipelines • But more processors • Throw-away cores to improve yield • Do you buy it?

7 Questions for Parallelism • Applications: 1. What are the apps? 2. What are kernels of apps? • Hardware: 3. What are the HW building blocks? 4. How to connect them? • Programming Models: 5. How to describe apps and kernels? 6. How to program the HW? • Evaluation: 7. How to measure success? (Inspired by a view of the Golden Gate Bridge from Berkeley)

Sea Change in Chip Design • Intel 4004 (1971): 4-bit processor,2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm2 chip • RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 micron NMOS, 60 mm2 chip • 125 mm2 chip, 0.065 micron CMOS = 2312 RISC II+FPU+Icache+Dcache • RISC II shrinks to  0.02 mm2 at 65 nm Processor is the new transistor!

Architecture Design space • What should each core look like? • Should all cores look the same? • How should the chip interconnect between them look? • What level of the cache should they share? • And what are the implications of that? • Are there new security issues? • Side channel attacks; denial of service attacks • Many other questions… Brand new playground; exciting time to do architecture research

Hardware Building Blocks: Small is Beautiful • Given difficulty of design/validation of large designs • Given power limits what can build, parallel is energy efficient way to achieve performance • Lower threshold voltage means much lower power • Given redundant processors can improve chip yield • Cisco Metro 188 processors + 4 spares • Sun Niagara sells 6 or 8 processor version • Expect modestly pipelined (5- to 9-stage) CPUs, FPUs, vector, SIMD PEs • One size fits all? • Amdahl’s Law  a few fast cores + many small cores

Elephant in the room • We tried this parallel processing thing before • Very difficult • It failed, pretty much • A lot of academic progress and neat algorithms, but little impact commercially • We actually have to do new programming • A lot of effort to develop; error prone; etc.. • La-Z-boy programming era is over • Need new programming models • Amdahl’s law • Applications: What will you use 1024 cores for? • These concerns are being voiced by a substantial segment of academia/industry • What do you think? • Its coming, no matter what

Lecture 26 Emerging Architectures

Lecture 26 Emerging Architectures

Presentation Transcript

Lecture 26

Lecture 26

LECTURE 26

Lecture 26

Lecture 26 Logic BIST Architectures

Lecture 26

ERD ITWG Emerging Research Architectures

Lecture 26

Lecture 26:

Lecture (26)

Lecture 26

Lecture №26

Lecture 26

Lecture 26

Lecture 26

Lecture 26

Lecture 26

Lecture 26

Lecture 26