520 likes | 768 Views
Lecture 2: Intro to Computer Architecture. Michael B. Greenwald Computer Architecture CIS 501 Fall 1999. General Information. Class: TR 1:30-3, in LRSM Auditorium Recitation: T 10:30-12 in Moore 225
E N D
Lecture 2: Intro to Computer Architecture Michael B. Greenwald Computer Architecture CIS 501 Fall 1999
General Information • Class: TR 1:30-3, in LRSM AuditoriumRecitation: T 10:30-12 in Moore 225 • Instructor: Professor Michael GreenwaldOffice: Moore (GRW), room 260email: cis501@cis.upenn.eduOffice hours: R10:30-12noon or by appt. • TA: Sotiris IoannidisOffice: Moore, room 102eemail: sotiris@dsl.cis.upenn.eduOffice hours: TR5-6PM or by appt. • Secretary: Christine MetzOffice: Moore, room 556
Outline • Review • Quantitative principles of computer design • Amdahl’s law • CPU performance equation • Quantitative measurements • Costs • Performance
Typos in HW 3c. • New version on web page. • D = defects/ • Defects per layer
Technology Trends: Microprocessor Capacity “Graduation Window” Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law • CMOS improvements: • Die size: 2X every 3 yrs • Line width: halve / 7 yrs
Trends in application demands • Program increase memory demands by factor of 1.5-2 per year (1/2 to 1 bit/year) • Avail. disk space (or net bw) is always consumed. • User I/O bandwidth grows: tty->crt->bitmap->video->?virtual reality? • Processing power: cheapest to produce one version of program. Optimize for mid-range. Slow on low-end, fast on high-end. Are these demands growing because of increased capabilities or increased appetites?
Measurement and EvaluationQuantitative Approach • Architecture is an iterative process: • Searching the space of possible designs • At all levels of computer systems Cost / Performance Analysis Creativity Good Ideas Mediocre Ideas Bad Ideas
Measurement and EvaluationQuantitative Approach • Not a guarantee of good ideas, just a way to discard bad ideas. Cost / Performance Analysis Creativity Good Ideas Mediocre Ideas Bad Ideas
Computer Engineering Methodology Technology Trends
Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Benchmarks Technology Trends
Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Benchmarks Technology Trends Simulate New Designs and Organizations Workloads
Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Implementation Complexity Benchmarks Technology Trends Implement Next Generation System Simulate New Designs and Organizations Workloads
Measurement Tools Measure • Benchmarks, Traces, Mixes • Hardware: Cost, delay, area, power estimation • Simulation (many levels) • ISA, RT, Gate, Circuit • Queuing Theory • Rules of Thumb • Fundamental “Laws”/Principles Experiment Analyze Design
Measurement Tools Measure • Benchmarks, Traces, Mixes • Hardware: Cost, delay, area, power estimation • Simulation (many levels) • ISA, RT, Gate, Circuit • Queuing Theory • Rules of Thumb • Fundamental “Laws”/Principles Experiment Analyze Design All produce “measures”: what do measures mean? How do they compare?
DC to Paris Speed Passengers Throughput (pmph) 6.5 hours 610 mph 470 286,700 3 hours 1350 mph 132 178,200 The Bottom Line: Performance (and Cost) Plane Boeing 747 BAD/Sud Concorde • Time to run the task (ExTime) • Execution time, response time, latency • Tasks per day, hour, week, sec, ns … (Performance) • Throughput, bandwidth
DC to Paris Speed Passengers Throughput (pmph) 6.5 hours 610 mph 470 286,700 3 hours 1350 mph 132 178,200 The Bottom Line: Performance (and Cost) Plane Boeing 747 BAD/Sud Concorde • Which is better?
DC to Paris Speed Passengers Throughput (pmph) 6.5 hours 610 mph 470 286,700 3 hours 1350 mph 132 178,200 The Bottom Line: Performance (and Cost) Plane Boeing 747 BAD/Sud Concorde • Which is better? It depends if you are trying to win a race from DC to Paris, or you are trying to move the most people.
DC to Paris Speed Passengers Throughput (pmph) 6.5 hours 610 mph 470 286,700 3 hours 1350 mph 132 178,200 The Bottom Line: Performance (and Cost) Plane Boeing 747 BAD/Sud Concorde • Even if trying to move most people, performance is useless without understanding cost. Else, why not just fly two Concordes at once, doubling throughput? 747-400, $160M in ‘98
Costs • Performance metrics are mostly useless without understanding costs.
Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Wafer Defect Die Smaller dies are cheaper, and reduce cost per defect.
Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Defect Smaller dies are cheaper, and reduce cost per defect.
IC Cost parameters Number of masking levels (measure of manufacturing complexity), was typically 3.0, growing wafer yield = wafers that are not completely bad. Typically close to 100% Defects per unit area = 0.6 to 1.2 per cm2. Drops with learning curve. Die Cost goes roughly with die area4
Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Dies per wafer = * ( Wafer_diam / 2)2 – * Wafer_diam – Test dies Die Area 2 * Die Area Die Yield = Wafer yield * 1 + Defects_per_unit_area * Die_Area { } Die Cost goes roughly with die area4
Integrated Circuits Costs Die cost = Wafer cost Dies per Wafer * Die yield Dies per wafer = * ( Wafer_diam / 2)2 – * Wafer_diam – Test dies Die Area 2 * Die Area Die Yield = Wafer yield * 1 + Die Cost = Wafer cost * 1 + * ( Wafer_diam / 2)2 – * Wafer_diam Die Area 2 * Die Area Defects_per_unit_area * Die_Area { } Defects_per_unit_area * Die_Area { } Die Cost goes roughly with die area4
IC Cost parameters Defects per unit area = 0.6 to 1.2 per cm2 Technologies that can fix defects (e.g. lasers a’la Lincoln Labs (MIT)), reduce effective defects per unit area and increase yield. However, need to understand costs which differ from formula. Still: Die Cost goes roughly with die area+1
Real World Examples(circa ‘93) Chip Metal Line Wafer Defect Area Dies/ Yield Die Cost layers width cost /cm2 mm2 wafer 386DX 2 0.90 $900 1.0 43 360 71% $4 486DX2 3 0.80 $1200 1.0 81 181 54% $12 PowerPC 601 4 0.80 $1700 1.3 121 115 28% $53 HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73 DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149 SuperSPARC 3 0.70 $1700 1.6 256 48 13% $272 Pentium 3 0.80 $1500 1.5 296 40 9% $417 • From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15
Other Costs Die Test Cost = Test Jig Cost * Ave. Test Time Die Yield Packaging Cost: depends on pins, heat dissipation • Chip Die Package Test & Total cost pins type cost Assembly • 386DX $4 132 QFP $1 $4 $9 • 486DX2 $12 168 PGA $11 $12 $35 • PowerPC 601 $53 304 QFP $3 $21 $77 • HP PA 7100 $73 504 PGA $35 $16 $124 • DEC Alpha $149 431 PGA $30 $23 $202 • SuperSPARC $272 293 PGA $20 $34 $326 • Pentium $417 273 PGA $19 $37 $473
Average Discount Gross Margin Component Cost Cost/PerformanceWhat is Relationship of Cost to Price? • Component Costs • Direct Costs(add 25% to 40%) recurring costs: labor, purchasing, scrap, warranty • Gross Margin(add 82% to 186%) nonrecurring costs: R&D, marketing, sales, equipment maintenance, rental, financing cost, pretax profits, taxes • Average Discountto get List Price (add 33% to 66%): volume discounts and/or retailer markup List Price 25% to 40% Avg. Selling Price 34% to 39% 6% to 8% Direct Cost 15% to 33%
Average Discount Gross Margin Component Cost Cost/PerformanceWhat is Relationship of Cost to Price? • Component Costs • Direct Costs(add 25% to 40%) recurring costs: labor, purchasing, scrap, warranty • Gross Margin(add 82% to 186%) nonrecurring costs: R&D, marketing, sales, equipment maintenance, rental, financing cost, pretax profits, taxes • Average Discountto get List Price (add 33% to 66%): volume discounts and/or retailer markup List Price Avg. Selling Price Discretion Direct Cost
Chip Prices (August 1993) • Chip Area Mfg. Price Multi- Comment • mm2 cost plier • 386DX 43 $9 $31 3.4 Intense Competition • 486DX2 81 $35 $245 7.0No Competition • PowerPC 601 121 $77 $280 3.6 • DEC Alpha 234 $202 $1231 6.1Recoup R&D? • Pentium 296 $473 $965 2.0 Early in shipments • Assume purchase 10,000 units
Cost/Price/ProfitHow is R&D funded? • R&D 4% to 12%, contributes to gross margin (it is an indirect cost) • Two views: • Only 4% of income on R&D! • Investment: every $1 spent on R&D should lead to $8 to $25 in sales!
DC to Paris Speed Passengers Throughput (pmph) 6.5 hours 610 mph 470 286,700 3 hours 1350 mph 132 178,200 The Bottom Line: Performance (and Cost) Plane Boeing 747 BAD/Sud Concorde • Even if trying to move most people, performance is useless without understanding cost. Else, why not just fly two Concordes at once, doubling throughput? 747-400, $160M in ‘98
Performance Terminology • Time versus Performance: duration vs. rate. • Time: response time = execution time • Rate: throughput • Reciprocals: there is both a time and a performance measure for any performance metric. • “Improve performance”: time decreases, performance increases For computer systems the key performance metric is total execution time
Meaning of “Execution Time”(a.k.a. Response time) • Wall-clock-time, response time, elapsed-time: latency (including idle time) • vs. CPU Time: non-idle • System vs. User time: both elapsed and CPU • system performance: elapsed time on unloaded system (includes OS + idle time) • CPU performance: user CPU time on unloaded system
Terminology • What do we mean when we compare two measures and say that “X is n times faster than Y”?
The Bottom Line: Performance (and Cost) • "X is n times faster than Y" means • ExTime(Y) Performance(X) • --------- = --------------- = n • ExTime(X) Performance(Y) • Speed of Boeing 747 vs. Concorde • Throughput of Boeing 747 vs. Concorde
The Bottom Line: Performance (and Cost) • "X is n times faster than Y" means • 286,700 Performance(X) • ----------------------- = 1.60 • 178,200 Performance(Y) • Speed of Boeing 747 vs. Concorde • Throughput of Boeing 747 vs. Concorde
The Bottom Line: Performance (and Cost) • "X is n times faster than Y" means • 286,700 Performance(X) • ----------------------- = 1.60 • 178,200 Performance(Y) • Speed of Boeing 747 vs. Concorde • Throughput of Boeing 747 vs. Concorde Note: Natural or meaningful units. Hours per passenger-mile is slightly weirder than passenger-miles per hour.
Measurement Tools Measure • Benchmarks, Traces, Mixes • Hardware: Cost, delay, area, power estimation • Simulation (many levels) • ISA, RT, Gate, Circuit • Queuing Theory • Rules of Thumb • Fundamental “Laws”/Principles Experiment Analyze Design ENGINEERING:Convert this to that
Fundamental Principle of Computer Design • Make the common case fast • In every trade-off, favor the frequent case over the infrequent case. • But how do we quantify this? At what point is the cost to the infrequent case sufficiently large as to offset speedups to the frequent case?
Fundamental Principle of Computer Design • Make the common case fast • In every trade-off, favor the frequent case over the infrequent case. • But how do we quantify this? At what point is the cost to the infrequent case sufficiently large as to offset speedups to the frequent case? Amdahl’s Law quantifies this principle
Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced
Amdahl’s Law: Example • Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew= Speedupoverall =
Amdahl’s Law: Example • Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew= ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold 1 Speedupoverall = = 1.053 0.95
Amdahl’s Law: Example • Suppose fetching a page from a web cache is 1000 times faster than getting the page over the net, but hit rate on cache is only 30% ExTimenew= Speedupoverall =