1 / 26

Today’s topics

Today’s topics. Performance & Computer Architecture Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface , Morgan Kaufmann, 1997. http://computer.howstuffworks.com/pc.htm Slides from Alvy Lebeck, Duke CS

edward
Download Presentation

Today’s topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today’s topics • Performance & Computer Architecture • Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann, 1997. • http://computer.howstuffworks.com/pc.htm • Slides from • Alvy Lebeck, Duke CS • Marti Hearst, UC Berkeley SIMS • David Patterson, UC Berkeley CS • Mounir Hamdi, HKUST CS • Upcoming • Complexity

  2. Performance • Performance= 1/Time • The goal for all software and hardware developers is to increase performance • Metrics for measuring performance (pros/cons?) • Elapsed time • CPU time • Instruction count (RISC vx. CISC) • Clock cycles per instruction • Clock cycle time • MIPS vs. MFLOPS • Throughput (tasks/time) • Other more subjective metrics? • What kind of workload to be used? • Applications, kernels and benchmarks (toy or synthetic)

  3. What is Realtime? • Response time • Panic • How to tell “I am still computing” • Progress bar • Flicker • Fusion frequency • Update rate vs. refresh rate • Movie film standards (24 fps projected at 48 fps) • Interactive media • Interactive vs. non-interactive graphics • computer games vs. movies • animation tools vs. animation • Interactivity => real-time systems • system must respond to user inputs without any perceptible delay (A Primary Challenge in VR)

  4. Control Datapath The Big Picture • Since 1946 all computers have had 5 components • The Von Neumann Machine Processor Input Memory Output • What is computer architecture? Computer Architecture = Machine Organization + Instruction Set Architecture + ...

  5. Fetch, Decode, Execute Cycle • Computer instructions are stored (as bits) in memory • A program’s execution is a loop • Fetch instruction from memory • Decode instruction • Execute instruction • Cycle time • Measured in hertz (cycles per second) • 2 GHz processor can execute this cycle up to 2 billion times a second • Not all cycles are the same though…

  6. ISA Level FUs & Interconnect Organization Logic Designer's View • Capabilities & Performance Characteristics of Principal Functional Units (Fus) • (e.g., Registers, ALU, Shifters, Logic Units, ...) • Ways in which these components are interconnected • Information flows between components • Logic and means by which such information flow is controlled. • Choreography of FUs to realize the ISA

  7. SOFTWARE Instruction Set Architecture ... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. – Amdahl, Blaaw, and Brooks, 1964 -- Organization of Programmable Storage -- Data Types & Data Structures: Encodings & Representations -- Instruction Set -- Instruction Formats -- Modes of Addressing and Accessing Data Items and Instructions -- Exceptional Conditions

  8. The Instruction Set: a Critical Interface • What is an example of an Instruction Set architecture? instruction set

  9. Forces on Computer Architecture Technology Programming Languages Applications Cleverness Computer Architecture Operating Systems History

  10. Technology DRAM chip capacity Microprocessor Logic Density DRAM Year Size 1980 64 Kb 1983 256 Kb 1986 1 Mb 1989 4 Mb 1992 16 Mb 1996 64 Mb 1999 256 Mb 2002 1 Gb 2007 2 Gb 2009 4 Gb • In ~1985 the single-chip processor (32-bit) and the single-board computer emerged • => workstations, personal computers, multiprocessors have been riding this wave since • Now, we have multicore processors

  11. Technology => dramatic change • Processor • logic capacity: about 30% per year • clock rate: about 20% per year • Memory • DRAM capacity: about 60% per year (4x every 3 years) • Memory speed: about 10% per year • Cost per bit: improves about 25% per year • Disk • capacity: about 60% per year • Total use of data: 100% per 9 months! • Network Bandwidth • Bandwidth increasing more than 100% per year!

  12. Performance Trends

  13. Processor Transistor Count (from http://en.wikipedia.org/wiki/Transistor_count)

  14. Processor-Memory Speed Gap µProc 50%/yr. 1000 CPU “Moore’s Law” 100 Processor-Memory Performance Gap:(grows 50% / year) Performance 10 DRAM 9%/yr. (2X/10 yrs) DRAM 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

  15. Latency vs. Throughput

  16. Memory bottleneck • CPU can execute dozens of instruction in the time it takes to retrieve one item from memory • Solution: Memory Hierarchy • Use fast memory • Registers • Cache memory • Rule: small memory is fast, large memory is small

  17. A great idea in computer science • Temporal locality • Programs tend to access data that has been accessed recently(i.e. close in time) • Spatial locality • Programs tend to access data at an address near recently referenced data (i.e. close in space) • Useful in graphics and virtual reality as well • Realistic images require significant computational power • Don’t need to represent distant objects as well • Efficient distributed systems rely on locality • Memory access time increases over a network • Want to acess data on local machine

  18. Microprocessor Generations • First generation: 1971-78 • Behind the power curve (16-bit, <50k transistors) • Second Generation: 1979-85 • Becoming “real” computers (32-bit , >50k transistors) • Third Generation: 1985-89 • Challenging the “establishment” (Reduced Instruction Set Computer/RISC, >100k transistors) • Fourth Generation: 1990- • Architectural and performance leadership (64-bit, > 1M transistors, Intel/AMD translate into RISC internally)

  19. In the beginning (8-bit) Intel 4004 • First general-purpose, single-chip microprocessor • Shipped in 1971 • 8-bit architecture, 4-bit implementation • 2,300 transistors • Performance < 0.1 MIPS(Million Instructions Per Sec) • 8008: 8-bit implementation in 1972 • 3,500 transistors • First microprocessor-based computer (Micral) • Targeted at laboratory instrumentation • Mostly sold in Europe All chip photos in this talk courtesy of Michael W. Davidson and The Florida State University

  20. 1st Generation (16-bit) Intel 8086 • Introduced in 1978 • Performance < 0.5 MIPS • New 16-bit architecture • “Assembly language” compatible with 8080 • 29,000 transistors • Includes memory protection, support for Floating Point coprocessor • In 1981, IBM introduces PC • Based on 8088--8-bit bus version of 8086

  21. 2nd Generation (32-bit) Motorola 68000 • Major architectural step in microprocessors: • First 32-bit architecture • initial 16-bit implementation • First flat 32-bit address • Support for paging • General-purpose register architecture • Loosely based on PDP-11 minicomputer • First implementation in 1979 • 68,000 transistors • < 1 MIPS (Million Instructions Per Second) • Used in • Apple Mac • Sun , Silicon Graphics, & Apollo workstations

  22. 3rd Generation: MIPS R2000 • Several firsts: • First (commercial) RISC microprocessor • First microprocessor to provide integrated support for instruction & data cache • First pipelined microprocessor (sustains 1 instruction/clock) • Implemented in 1985 • 125,000 transistors • 5-8 MIPS (Million Instructions per Second)

  23. 4th Generation (64 bit) MIPS R4000 • First 64-bit architecture • Integrated caches • On-chip • Support for off-chip, secondary cache • Integrated floating point • Implemented in 1991: • Deep pipeline • 1.4M transistors • Initially 100MHz • > 50 MIPS • Intel translates 80x86/ Pentium X instructions into RISC internally

  24. Key Architectural Trends • Increase performance at 1.6x per year (2X/1.5yr) • True from 1985-present • Combination of technology and architectural enhancements • Technology provides faster transistors ( 1/lithographic feature size) and more of them • Faster transistors leads to high clock rates • More transistors (“Moore’s Law”): • Architectural ideas turn transistors into performance • Responsible for about half the yearly performance growth • Two key architectural directions • Sophisticated memory hierarchies • Exploiting instruction level parallelism

  25. Execution 2 Bus Intf D cache TLB Out-Of-Order branch SS Icache Where have all the transistors gone? • Superscalar (multiple instructions per clock cycle) • 3 levels of cache • Branch prediction (predict outcome of decisions) • Out-of-order execution (executing instructions in different order than programmer wrote them) Intel Pentium III (10M transistors)

  26. Laws? • Define each of the following. What has its effect been on the advancement of computing technology? • Moore’s Law • Amdahl’s Law • Metcalfe’s Law

More Related