230 likes | 328 Views
Computer performance. Speeding it up. Pipelining On board cache On board L1 & L2 cache Branch prediction Data flow analysis Speculative execution. Performance Balance. Processor speed increased Memory capacity increased Memory speed lags behind processor speed.
E N D
Speeding it up • Pipelining • On board cache • On board L1 & L2 cache • Branch prediction • Data flow analysis • Speculative execution
Performance Balance • Processor speed increased • Memory capacity increased • Memory speed lags behind processor speed
Solutions • Increase number of bits retrieved at one time • Make DRAM “wider” rather than “deeper” • Change DRAM interface • Cache • Reduce frequency of memory access • More complex cache and cache on chip • Increase interconnection bandwidth • High speed buses • Hierarchy of buses
I/O Devices • Peripherals with intensive I/O demands • Large data throughput demands • Processors can handle this • Problem moving data • Solutions: • Caching • Buffering • Higher-speed interconnection buses • More elaborate bus structures • Multiple-processor configurations
Key is Balance • Processor components • Main memory • I/O devices • Interconnection structures
Improvements in Chip Organization and Architecture • Increase hardware speed of processor • Fundamentally due to shrinking logic gate size • More gates, packed more tightly, increasing clock rate • Propagation time for signals reduced • Increase size and speed of caches • Dedicating part of processor chip • Cache access times drop significantly • Change processor organization and architecture • Increase effective speed of execution • Parallelism
Problems with Clock Speed and Login Density • Power • Power density increases with density of logic and clock speed • Dissipating heat • RC delay • Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them • Delay increases as RC product increases • Wire interconnects thinner, increasing resistance • Wires closer together, increasing capacitance • Memory latency • Memory speeds lag processor speeds • Solution: • More emphasis on organizational and architectural approaches
Increased Cache Capacity • Typically two or three levels of cache between processor and main memory • Chip density increased • More cache memory on chip • Faster cache access • Pentium chip devoted about 10% of chip area to cache • Pentium 4 devotes about 50%
More Complex Execution Logic • Enable parallel execution of instructions • Pipeline works like assembly line • Different stages of execution of different instructions at same time along pipeline • Superscalar allows multiple pipelines within single processor • Instructions that do not depend on one another can be executed in parallel
Diminishing Returns • Internal organization of processors complex • Can get a great deal of parallelism • Further significant increases likely to be relatively modest • Benefits from cache are reaching limit • Increasing clock rate runs into power dissipation problem • Some fundamental physical limits are being reached
New Approach – Multiple Cores • Multiple processors on single chip • Large shared cache • Within a processor, increase in performance proportional to square root of increase in complexity • If software can use multiple processors, doubling number of processors almost doubles performance • So, use two simpler processors on the chip rather than one more complex processor • With two processors, larger caches are justified • Power consumption of memory logic less than processing logic • Example: IBM POWER4 • Two cores based on PowerPC
Pentium Evolution (1) • 8080 • first general purpose microprocessor • 8 bit data path • Used in first personal computer – Altair • 8086 • much more powerful • 16 bit • instruction cache, prefetch few instructions • 8088 (8 bit external bus) used in first IBM PC • 80286 • 16 Mbyte memory addressable • up from 1Mb • 80386 • 32 bit • Support for multitasking
Pentium Evolution (2) • 80486 • sophisticated powerful cache and instruction pipelining • built in maths co-processor • Pentium • Superscalar • Multiple instructions executed in parallel • Pentium Pro • Increased superscalar organization • Aggressive register renaming • branch prediction • data flow analysis • speculative execution
Pentium Evolution (3) • Pentium II • MMX technology • graphics, video & audio processing • Pentium III • Additional floating point instructions for 3D graphics • Pentium 4 • Note Arabic rather than Roman numerals • Further floating point and multimedia enhancements • Itanium • 64 bit • see chapter 15 • Itanium 2 • Hardware enhancements to increase speed • See Intel web pages for detailed information on processors
PowerPC • 1975, 801 minicomputer project (IBM) RISC • Berkeley RISC I processor • 1986, IBM commercial RISC workstation product, RT PC. • Not commercial success • Many rivals with comparable or better performance • 1990, IBM RISC System/6000 • RISC-like superscalar machine • POWER architecture • IBM alliance with Motorola (68000 microprocessors), and Apple, (used 68000 in Macintosh) • Result is PowerPC architecture • Derived from the POWER architecture • Superscalar RISC • Apple Macintosh • Embedded chip applications
PowerPC Family (1) • 601: • Quickly to market. 32-bit machine • 603: • Low-end desktop and portable • 32-bit • Comparable performance with 601 • Lower cost and more efficient implementation • 604: • Desktop and low-end servers • 32-bit machine • Much more advanced superscalar design • Greater performance • 620: • High-end servers • 64-bit architecture
PowerPC Family (2) • 740/750: • Also known as G3 • Two levels of cache on chip • G4: • Increases parallelism and internal speed • G5: • Improvements in parallelism and internal speed • 64-bit organization
Internet Resources • http://www.intel.com/ • Search for the Intel Museum • http://www.ibm.com • http://www.dec.com • Charles Babbage Institute • PowerPC • Intel Developer Home