360 likes | 560 Views
Chapter 1. Fundamentals of Computer Design. Introduction Performance Improvement due to (1). Advances in the technology (2). Innovation in computer design 1945-1970: (1) and (2) made a major contribution to performance improvement
E N D
Chapter 1. Fundamentals of Computer Design • Introduction • Performance Improvement due to (1). Advances in the technology (2). Innovation in computer design • 1945-1970: (1) and (2) made a major contribution to performance improvement • 1970 ~ : 25% to 30% per year performance improvement for the mainframes and minicomputers. • 1975~ : 35% per year performance improvement for microprocessors simply due to (1).
Changes in the Marketplaces Made a Successful Architecture • The virtual elimination of assembly language reduced the need for object code compatibility • The creation of standardized, vendor-independent operating system, such as Unix and Linux, lowered the cost and risk • Consequence of the changes • Enable the development of RISCs to focus on • Exploitation of instruction level parallelism • Use of caches • Lead to 50% increase in performance per year
The Effect of the Growth rate in Computer Performance • Significantly enhanced the capability available to computer users • Lead to the dominance of microprocessor-based computers across the entire range of computer design. • Workstations and PCs have emerged as major products. • Servers replace minicomputers. • Multiprocessors replace mainframe computers and super computers. • The advance of IC technology • Emergence of RSIC • Renewal of CISC such as x86 (IA32) microprocessors.
The Changing Face of Computing 1960s Large mainframes • Business data processing and scientific computing 1970s Minicomputers • Time-sharing 1980s Desktop computing(personal computing) 1990s Internet and Word Wide Web (servers) 2000s Embedded computing, mobile computing, and pervasive computing
Tasks of a Computer Designer • Determine what attributes are important for a new machine, then design a machine to maximize performance while staying within cost constraints. • Task aspects: Instruction set design, functional organization, logic design, and implementation. • In the past, “Computer Architecture” often referred only to instruction set design. Other aspects of computer design were called “implementation”. • In this book, “Computer Architecture” is intended to cover all three aspects of computer design: instruction set architecture, organization and hardware. • “Instruction set architecture” refers to the actual programmable-visible instruction set. It serves as the boundary between the hardware and software.
“organization” includes the high-level aspects of a computer’s design, such as the memory system, the bus structure, and the internal CPU. • NEC VR5432 and NEC VR 4122 have the same instruction set architecture but with different organization. • “Hardware” would include the detailed logic design and packaging technology of the machine. • For example: different Pentium microprocessors running in different frequency have the same instruction set architecture and organization but with different hardware implementation • Organization and hardware are two components of implementations.
Functional Requirements (Fig. 1.4) • Application Area: • General purpose, scientific and server, commercial, embedded computing • Level of Software Compatibility • At programming language, object code or binary code compatibility • Operating System Requirements • Size of address space, memory management, protection • Standards • Floating point, I/O bus, operating systems, networks, programming languages
Technology Trends • A successful instruction set architecture must be designed to survive changes in computer implementation technology. • Trends in implementation technology: • Integrated circuit logic technology: • Transistor density: 35% increase per year, quadruple in 4 years. • Die size: 10%~20% increase per year • Transistor count/per chip: 55% increase per year. • Transistor speed: scales more slowly. • DRAM: • Density: 40%~60% increase per year recently. • Cycle time : decrease 1/3 in 10 years. • Magnetic disk: • Density: 100% increase per year recently. 30% increase per year, double in 3 years, prior to 1990. • Network technology • Ethernet: 10M to 100M to 1G byte band width.
Scaling of IC Technology • IC Process Technology • 10um(1971) 0.18um(2001) • IC Technology and Computer Performance • Transistor performance • Wire delay • Power consumption
Cost, Price and Their Trends • Cost reduction factors • Learning curve drives the cost down; manufacturing costs over time, i.e., yield improvement. • High volume (i.e. mass production) • Commodities are products sold by multiple vendors in large volumes and essentially identical, i.e., competition. • Price of DRAM (fig. 1.5) • Price of Pentium III (fig. 1.6) • Cost of an integrated circuit • Cost of die =f(die area) • Computer designer affects die size both by what functions are included on the die and by the number of I/O pins. • Distribution of cost in a system (fig. 1.9, 1.10)
Measuring and Reporting Performance • “X is n times faster than Y” means • The term “system performance” is used to refer to elapsed time on an unloaded system. • CPU performance refers to user CPU time on an unloaded system. • To evaluate a new system is to compare the execution time of her workload - the mixture of programs and operating system commands run on a machine.
Choosing Programs to Evaluate Performance • Best case: Measure the execution time of a system’s workload • General case: five levels of programs are used: • Real programs: C compiler, Tex, Spice, etc. • Modified (scripted) applications: A collection of real applications… • Kernels: small, key pieces from the real programs, ex., Livermore loops and Linpack. • Toy Benchmarks: 10 to 100 lines of code and produce a result the user already knows, ex., puzzle, quicksort,… • Synthetic benchmarks: try to match the average frequency of operations and operands of a large set of programs, ex., Whetstone and Drystone. • Performance prediction accuracy: • Real programs is best, wile synthetic benchmarks is worst. and reporting performance results (fig. 1.9 &1.10)
Benchmark Suites • SPEC (Standard Performance Evaluation Corporation) • www.spec.org • Benchmark types • Desktop benchmarks • Server benchmarks • Embedded benchmarks
Desktop Benchmarks • SPEC Benchmarks • SPEC CPU2000 (SPEC95, SPEC92, SPEC89) (Fig. 1.12) • Graphic benchmarks • SPECviewperf • SPECapc • Window’s OS benchmarks (Fig. 1.11) • Business Winstone • CC Winstone • Winbench
Server Benchmarks • SPEC • File server benchmarks: SPECSFS • Measuring NFS performance • Web server benchmarks: SPECWeb • Simulate multiple clients requesting both static and dynamic pages. • TPC (Transaction-Processing Council) • TPC-A, TPC-C, TPC-H, TPC-R, TPC-W • Simulate a business-oriented transactions (queries) • www.tpc.org
Embedded Benchmarks • EDN Embedded Microprocessor Benchmark Consortium (EEMBC) (Fig. 1.13) • Automotive/industrial • Consumer • Networking • Office automation • Telecommunications
Reporting Performance Results • Guiding Principle • The performance measurements should be reproducibility. • Needs to tell • Hardware configurations • Software used • Is source code modification for benchmarks allowed?
Comparing Performance Computer A Computer B Computer C P1 (secs) 1 10 20 P2 (secs) 1000 100 20 Total time 1001 110 40 • Total execution time: A consistent summary measure • Another metrics • Average execution time (arithmetic mean) • Harmonic mean • Weighted execution time: • Geometric mean:
Quantitative Principles of Computer Design • Make the common case fast: A fundamental law, called Amdahl’s Law, can be used to quantify this principle. • Amdahl’s Law • the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. • Amdahl’s Law defines speedup as • Example on pages 41 and 42
CPU Performance Equation • Dependency • Clock cycle time - Hardware technology and organization • CPI - Organization and instruction set architecture • Instruction count - Instruction set architecture and compiler • Sometimes and overall • Example on page 44.
Measuring the Components of CPU Performance • Clock cycle time: Timing simulator or timing verifiers • IC(instruction count) • Direct measurement from running the applications on hardware • Instruction set simulator - slow but accurate • Instrumentation code approach: the binary program is modified by inserting some extra code into every basic block. Fast but need instruction set translation if simulated machine differs from simulating machine. • CPI: very difficult to measure CPI = Pipeline CPI + Memory system CPI • Basic block Label:xxx Branch *** Branch *** Label: xxx Branch *** Branch *** Label: xxx Label: xxx • Use the CPU performance equations to compute performance
Principle of Locality • Application of Amdahl’s Law • A program spends 90% of execution time in on 10% of the code. • Temporal locality: Recently accessed items are likely to be accessed in the near future. • Spatial locality: Items whose addresses are near one another tend to be referenced close together in time.
Put it All Together • Performance and Price-Performance • Desktop computers • Server computers • Embedded processors