250 likes | 516 Views
Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.utah.edu/~rajeev. What is Computer Architecture?. What is Computer Architecture?. If the Intel Pentium4 has a faster clock speed than the
E N D
Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.utah.edu/~rajeev
What is Computer Architecture? • If the Intel Pentium4 has a faster clock speed than the • IBM Power4, does it execute your programs faster?
What is Computer Architecture? • If the Intel Pentium4 has a faster clock speed than the • IBM Power4, does it execute your programs faster? Case 1: Completing instruction Clock tick Case 2: Time
What is Computer Architecture? • To a large extent, computer architecture determines: • the number of instructions used to execute a program • the time each instruction takes to execute • the idle cycles when no work gets done • the number of instructions that can execute in parallel
A Typical Microprocessor Branch Predictor L1 Instr Cache Decode & Rename Issue Logic L2 Cache L1 Data Cache ALU ALU ALU ALU Register File
Architecture Trends in the 90s • Performance was the ultimate metric • Transistors were a limiting factor • As on-chip transistors became available in the 90s, more functionality • and complex circuitry was added to boost performance – most of the • low-hanging fruit has now been picked
Hitting the Wall • We have now hit the following walls: • Single core performance • Memory • Complexity • Power, temperature
Hitting the Power Wall From Shekhar Borkar, MICRO’99 Power is as important a metric today as performance
The Advent of Multi-Core Chips Core Cache bank • In the past, performance magically increased by 50% every year • In the future, this improvement will be only ~20% every year • … unless … the application is multi-threaded!
Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors For publications, see http://www.cs.utah.edu/~rajeev/research.html
Interconnects as a Bottleneck • In the past, on-chip data transmission on wires cost almost nothing • Interconnect speed and power has been improving, but not at the • same rate as transistor speeds • Hence, relative to computation, communication is much more expensive • In the near future, it will take 100 cycles to travel across the chip • 50% of chip power can be attributed to interconnects
Interconnects in Multi-Core Chips CPU 1 CPU 2 L2 cache L2 control L2 control CPU 3 L1 A A A A A A A
Not all Wires are Created Equal B-Wires L-Wires W-Wires PW-Wires Relative latency 1x 0.5x 1.6x 3.2x Relative area 1x 4x 0.5x 0.5x Dynamic power (W/m) 2.65a 1.46a 2.9a 0.87a Static Power (W/m) 1.02 0.57 1.16 0.31
Data Transfers have Varying Needs • Example of a cache coherence transaction: • Read exclusive request for a shared block
Other Interconnect Choices • Optical interconnects: speed of light, cost in converting • between optical and electrical domains • 3D chips: reduces communication distances, low cost • for vertical signal transmission, increase in power density
3D Layouts Cluster Cache bank Intra-die horizontal wire Inter-die vertical wire Die 1 Die 0 (a) Arch-1 (cache-on-cluster) (b) Arch-2 (cluster on cluster) (c) Arch-3 (staggered)
Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors Clustered architectures: relatively low complexity scalable solution easily handles multiple threads
Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors Heterogeneous perf/power Cores that execute the OS Cores that verify results
Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors Hardware to support transactional memory
Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors Faults are caused by high energy particles that deposit enough charge to toggle bits Variations in conditions may cause a circuit to not produce its result in time
Research Methodologies • It’s all about the simulators! • Simplescalar & Wattch & Hotspot: about 10,000 lines of • C code that models the flow of instructions through a • modern processor • Inputs: configuration file that specifies processor • parameters, benchmark program (say, gzip) • Outputs: how long the program runs on the simulated • processor (Simplescalar), how much power is consumed • (Wattch), what is the peak temperature (Hotspot)
Evaluating a New Idea • Lots of reading (it’s better than waiting for divine inspiration) • Identify bottlenecks, identify problems, develop an idea, repeatedly • question that idea • Understand simulator • Engineer a solution, modify simulator code (perhaps, write fewer than • 1000 lines of C code) • Analyze data (things never work the first time), engineer/optimize/debug • your solution • Write papers • Implement in silicon?
To Learn More… • CS/EE 3810: Computer Organization • CS/EE 6810: Computer Architecture • CS/EE 7810: Advanced Computer Architecture • CS/EE 7820: Parallel Computer Architecture • CS 7937 / 7940: Architecture Reading Seminar
Title • Bullet