What to do and why in microprocessor research

What to do and why in microprocessor research Mario Nemirovsky University of California, Santa Cruz XStream Logic, Inc. mario@ieee.org

Agenda • Microcontroller era • PC era • Post-PC era • Directions and challenges Mario Nemirovsky

Microcontroller era • Intel • 4004 => 8008 => 8080 => 8085 • Motorola • 6800 => 68000 Mario Nemirovsky

Real time systems • First microprocessors used as controllers • Late 70’s to early 80’s • Delco Electronics (General Motors) #1 microprocessor user and producer • TIO first real time multithreaded microprocessor • Motorola dominated the market • CISC needed, why? Mario Nemirovsky

The PC era • 68K used for workstations • Apollo and the MMU issue • Apple and the A trap • Intel 80x86 • IBM introduce the PC Mario Nemirovsky

RISC vs. CISC • RISC values • simplicity • fast design cycles • small area • CICS values • small footprint • fewer number of fetch • small register file? Mario Nemirovsky

General purpose microprocessors • Performance • For for past 10 years avg. annual performance growth averages 1.59! • Architectural directions • Exploiting instruction level parallelism • Memory hierarchies • Special purpose micros not needed! Mario Nemirovsky

Hardware Techniques Pipelining Dynamic issue (i. e. superscalar) Dynamic multistreaming (SMT) Dynamic scheduling Dynamic branch prediction Dynamic disambiguation Dynamic “super”speculation Dynamic recompilation Software Techniques Static scheduling Static issue (i. e. VLIW) Static branch prediction Alias/ pointer analysis Static speculation Today’s Uniprocessor Mario Nemirovsky

Limit of ILP Mario Nemirovsky

IPC on a real machine Mario Nemirovsky

Multistreamed Superscalar Processor • Exploit thread level parallelism • Interleaved execution of instructions from distinct threads • Multiple hardware contexts (streams) • Improve performance by making better use of processor resources Mario Nemirovsky

Multistreaming Work • The beginning: the CDC6600 - J.E.Thornton • Early 80’s: the HEP – B.Smith • Mid 80’s: Delco TIO – M.Nemirovsky • Late 80’s: UCSB DISC – M.Nemirovsky • Early 90’s: UCSB MSP (SMT) – M.Nemirovsky & M. Serrano • In ISCA91 “Simultaneous Instruction Issuing” – H.Hirata • In HICSS94 “Performance Estimation of Multistreamed , Superscalar Procesors” – W.Yamamoto & M.Nemirovsky et al • In ISCA95 “Simultaneous Multithreading” – D.Tullsen • In PACT95 “Increasing superscalar performance through multistreaming” – W.Yamamoto & M.Nemirovsky Mario Nemirovsky

Multistreamed, Superscalar Processor (PACT’95, Yamamoto & Nemirovsky) Mario Nemirovsky

Performance Regions • Linear • Performance limited by workload parallelism • Saturation • Performance limited by machine parallelism Mario Nemirovsky

Limits on Performance • Machine Parallelism (mp) • Determined by the functional unit configuration and the dynamic instruction mix • Example: 2 integer, 60%; 1 memory, 40% • Workload Parallelism • Characteristic of a program • Compiler dependence Mario Nemirovsky

Functional Unit Effect on Performance (Ph.D. Dissertation (UCSB), March’94, M.Serrano) Mario Nemirovsky

Execution Profiles 1 stream 2 streams (PACT’95, Yamamoto & Nemirovsky) Mario Nemirovsky

Execution Profiles 3 streams 4 streams (PACT’95, Yamamoto & Nemirovsky) Mario Nemirovsky

Caches • Caches are shared among the streams • Miss rate increases due interstream conflicts • Individual thread performance decreases • Overall performance increases • Bus Utilization Increases • Increase is the product of the speed up and the miss rate increase • Design to maximize speed up while minimizing miss rate increase Mario Nemirovsky

Extrinsic Misses • Extrinsic misses make up a significant portion of the miss rate direct mapped-16 byte line (MTEAC’98, Nemirovsky & Yamamoto) Mario Nemirovsky

The new era • Even if the large gains in performance in last 15 years can be continued (which may be very hard), there are new applications that are growing even faster. • New applications other than PC centric • Larger diversity of requirements Mario Nemirovsky

“Post-Desktop Era” ? • Information appliances • Multiple computers per person • Internet and web centric • Access to services is “one of” the killer app • 3-D is “one of” the killer app, …… Mario Nemirovsky

Applications Fueling the Growth of the Internet 10000 Streaming Video • Video on Demand Telephony • Voice over IP (DSL) 1000 Transactions • E-commerce (v.90 access at home) Throughput (MB/s) 100 Graphics • Web browsing (direct connections at work) 10 Text • E-mail, ftp, news (low-speed connections) 1 1990 1994 1997 2000 2003 Mario Nemirovsky

Future • Larger growth outside desktop PC • New performance metrics • “DoomMarks” vs. SPECmarks, MPPs vs. MFLOPs • Wider spectrum of requirements • Performance • Power • TTM • Reliability • Real Time • Cost Mario Nemirovsky

Opportunities • Application specific processors vs. GP • “Multiple” high-end CPU designs • Low-power architectures • Better CAD support • Fault-tolerant systems • Real Time architectures • Integration - System on a chip Mario Nemirovsky

Conclusions • Processors will have new constrains • “Multiple” general-purpose processors • Stream data • Light threads • New interfaces • Cache friendliness • Internet and communication will dominate • Reliability Mario Nemirovsky

Multithreading Work in 87 • Multiprocessor Systems • Fine grained instruction interleaving (HEP) • Coarse grained instruction interleaving (Sparcle) • Embedded Real Time Control • GM engine controller The TIO has up to 33 streams actives simultaneously, each stream controls a spark, fuel, and other function per cylinder Mario Nemirovsky

Multistreaming Work in 90 • Multiprocessor Systems • Fine grained instruction interleaving (TERA) • Coarse grained instruction interleaving (Sparcle) • Embedded Real Time Control • GM engine controller • Fine grained, dynamic instruction interleaving (DISC) DISC uses dynamic interleaving where the instruction dispatch algorithm dynamically reallocates throughput to the unblocked streams. This algorithm eliminates data and control hazards without degrading single stream latency. Mario Nemirovsky

Multistreaming Work in 92 • Multiprocessor Systems • Fine grained instruction interleaving (TERA) • Coarse grained instruction interleaving (Sparcle) • Embedded Real Time Control • GM engine controller • Fine grained, dynamic instruction interleaving (DISC) • Multistreamed, Superscalar Processors • Fine grained, dynamic instruction interleaving. • Each stream is a logical superscalar processor • Multiple functional unit design Mario Nemirovsky

Multistream Performance 1 stream Mario Nemirovsky

Multistream Performance 2 streams Mario Nemirovsky

Multistream Performance • Performance Bounds • Workload parallelism: 1-2 streams • Machine parallelism: 3-4 streams • Data cache miss rate increased by 18% when moving from a single stream to 2 streams Mario Nemirovsky

Interference • Associativity reduces interference • Increasing capacity reduces interference for large associative caches Mario Nemirovsky

Interference • Increasing the line size increases interference Mario Nemirovsky

Interference • Increasing the number of streams increases interference 2 way set associativity Mario Nemirovsky

Overall Miss Rate • Increasing the line size: • decreases the miss rate for large caches • increases the miss rate for small caches • Multistreaming favors smaller line sizes Mario Nemirovsky

Individual Thread Performance • Round Robin Scheduling • Streams share the throughput equally • Individual thread execution time increased by 13% for 2 streams • Priority Scheduling • Streams are assigned a priority • Individual thread execution time increased by 2% for 2 streams • Lower priority stream executed at 73% of single stream performance Mario Nemirovsky

Better ways to exploit parallelism? • – Key to improving architectural gain/ transistor • – More SW & algorithmic involvement may be required! • Think about high level forms of parallelism • – More explicit, but gentle slope is crucial • Can speculative multithreading help? • More evolutionary: microarchitecture level • – Reduce importance of binary compatibility? • – Multi- purpose ISAs rather than general- purpose? • Single architecture adopts to different applications • – Possible directions • More static pipeline structures (LIW, VLIW) • Easier adoption of multiprocessing? • “Configurable” architectures (multiuse vs. g. p.) Mario Nemirovsky

What to do and why in microprocessor research

What to do and why in microprocessor research

Presentation Transcript

What Do I Want to Research?

What To Do and What NOT To Do In An Essay!!!

Resident Admission Procedures: What to do and Why to do it!

Why Do Clinical Research?

What Do Biostatisticians Do in Biomedical Research?

Why do research?

Why do research?

Why do we do research?

Why We Do Research

Why do we research?

Why do educational research?

What is Marketing Research? Why do Organizations Need Market Research?

WHY DO SOCIAL RESEARCH ?

DISPARITIES: Why and What to do About Them?

What is Marketing Research? Why do Organizations Need Market Research?

Urbanization in globalization process: Where? Why? and What to do?

What to Do? A Research Agenda

What to do and what don’t to do In tornado

What to Do? A Research Agenda

Research Summary (What I Do and Why It’s Important)

What To Do and What Not to do in ASP