660 likes | 668 Views
Explore advancements in CPU design focusing on system bus speed, clock frequencies, casing, cooling systems, caches, and more to boost overall system performance.
E N D
Computer Architecture Part II-D: Survey of Processor Architecture
Microprocessors in the Market What’s the difference?
Areas of Development • Below are technologies which can be improved in CPU design: • System bus speed • Internal and external clock frequency • Casing • Cooling system • Instruction set • Material used for the die • End result: enhance speed of the CPU and the system in general
CPU Caches System Bus Adapters Memory Bus Controllers Disks Displays Keyboards I/O Devices: Networks The System Bus • Conduit for moving data between the processor and other system components
System Bus Speeds • Intel Pentium Core 2 Quad/Duo have CPU clocks of 2.66/3 GHz with system bus speeds of 1066/1333 MHz • AMD: 2nd Generation Opteron (dual core) processor has clock speed of 1.8 GHz with a 1000 MHz system bus
Split Clock Frequency • Internal clock frequency • Speed of data processing inside the CPU • External clock frequency • Speed of data transfer to and from the CPU via the system bus • Intel 486DX2 25/50 was first to use clock doubling to implement split clock system
The GHz Race in CPU Frequency • June 1999: API (Alpha Processor Inc.) demonstrated a 1 GHz chip • March 2000: AMD released Athlon 1 GHz; within days Intel released 1 GHz Pentium III • 2002: AMD, Intel uses 0.13 micron technology • Athlon XP 2200+ (June) • Pentium 4 2.53 GHz (May), mobile Pentium 4 2 GHz (June) • 2004 • Pentium 4: 3.6 GHz, 800 MHz system bus • AMD: 3200+, 2.2 GHz, 400 MHz : Same as 2003 32-bit CPUs now concentrating on 64-bit • 2005 • Pentium 4: 3.73 – 3.8 GHz, 800/1066 MHz system bus • AMD: Same as 2004
Is Moore’s Law Dead? • Intel’s vision of a 10 GHz CPU cannot be realized due to heat problems • Some have pushed speed limits through high-end cooling systems • Both Intel and AMD no longer concentrating on speed as performance driver • SIA says “Moore’s Law is still going strong after 40 years”
Micron Technology • A micron is 1 millionth of a meter • Human hair strand about 100 microns • Objective: thinner wires • Allow CPU to operate at lower voltage • Results in CPU generating less heat and operating at higher speeds • Currently, processors are in the range of 0.065 microns (65 nm) • Intel’s Roadmap: 45 15 nm
Thinner Wires = Increased Transistors Pertium 4 42,000,000 AMD K6 8,800,000 486SX/486DX 486DX2/486DX4 1,200,000 386DX/386SX 250,000 Pentium, Cyrix AMD K5, MMX 3,100,000 8086/8088 22,000 286 128,000 Athlon 1.4 GHz 37,000,000
The Switch to Copper • Aluminum limits making chips smaller • Copper is a good choice because it • is a better conductor • consumes less energy, and • takes up less space than aluminum • Copper allowed processors to boost speeds to the GHz range • IBM pioneered the use of copper on September 1, 1998 (IBM Power PC 740/750)
PC on a Chip • Integrates a number of key components into one chip • Result: The chip replaces dozen or so separate chips (memory, FPU, graphics, video, etc.) • Applications: PDAs, cellphones, set-top boxes, embedded processors, etc.
Impact of PC-on-a-Chip • Smaller and quieter desktops • Battery of devices lasts longer because of the low power drain • Proliferation of information appliances
CPU Receptacle • ZIF • Zero Insertion Force socket - type of socket designed for easy insertion of chips that have high density of pins • Socket 7 - popular implementation of ZIF
CPU Receptacle • Slot 1 • Consists of receptacle on the motherboard that holds an Intel Single Edge Contact (SEC) cartridge • Cartridge may contain up to two CPUs and an L2 cache (runs at half the speed of CPU) and plugs into 242-pin receptacle • Started with Pentium II
CPU Receptacle • A Pentium II mounted on Slot 1
CPU Receptacle • Slot 2 • An enhanced Slot 1 • Uses 330-pin SEC • Holds up to four CPUs • L2 cache runs at full processor speed • First used in Intel's Pentium II Xeon
CPU Receptacle • AMD’s Slot A • Receptacle on motherboard for K7 CPU • Physically similar to Slot 1, but has different electrical requirements
Casing: FC-PGA (Flip-Chip) Traditional Wiring Flip-Chip (IBM)
Advantages of FC-PGA • Greater # of I/O pins available • Shorter electrical connections • Better manufacturing efficiency
Casing: FC-LGA Bottom view of LGA/BGA-based CPU LGA Socket 775
Advantages of FC-LGA • Lower voltage used (less distance traveled, reduced signal loss) • Less heat dissipation
Cache • Works as buffer between CPU and memory • Two types: • Internal • External
L3 L2 L1 Levels of Cache • Level 1 • Level 2 • Level 3
Cache Placement • Intel used to have external L2 cache • Pentium Pro • Internal but CPU and L2 cache are separate • Result: larger chip that requires a larger socket
Overclocking • Going beyond recommended clock frequency settings • 3 method of overclocking • System bus frequency • CPU frequency multiplier • Change both of the above • Some CPUs have locked frequencies
Overclocking: How to... • Done through BIOS program • Older systems require motherboard jumpers • Some motherboards (e.g. ASUS TX97) contain jumper codes
Overclocking Issues • Heat! • Can main memory cope? • Will the software still work?
Cooling Systems • CPUs get hotter as they get faster • Developed to keep the CPU from overheating • Sophisticated cooling systems allow more reliable CPU operation
Liquid Nitrogen: Extremely Cool! CPU: Pentium 4 (Northwood) Date: Christmas 2003
Multimedia Processing • Multimedia applications require geometric transformation • Re-computation of location and size of an image to determine new position • Deals with FP • FPU handles all real number computations • Drawing landscapes (e.g. games) involves lots of computations and CPU may not handle it as fast as the player could react
Ways of Handling Multimedia • Speed up the CPU • Improve the CPU’s FPU by adding more pipelines • Use high-end 3D graphics cards • Add new multimedia instructions
Multimedia Innovations in CPUs • MMX • 3DNow! • SSE
MMX • Introduced 1995 in the Pentium processor • Had 57 new instructions for 3D graphics • Introduced SIMD (Single Instruction Multiple Data) instructions: technique that processes more than one integer simultaneously • Problems: • Only works with integers • CPU can only work with either MMX or FPU, not both simultaneously because they share registers
3DNow! • Introduced summer of 1998 in the AMD K6-2 • Characteristics • Supports SIMD instructions • Improved handling of numbers • Successful! • Integrated in Windows, games, and drivers • Does not use the same registers
SSE • Introduced in Pentium III (Katmai) 500 MHz as Intel’s response to 3DNow! • Characteristics • 8 new 128-bit registers (can hold four 32-bit #s) • Has Streaming SIMD Extensions • 50 new instructions enabling simultaneous advanced calculations of more FP with a single instruction • New Media Instructions designed for coding and decoding MPEGs
Problems with SSE • Pipelines can only handle two 32-bit numbers at a time • To take advantage of 128-bit registers, FPU pipeline should have been doubled (would have pushed back release date of Katmai) • Potentially, it could have enhanced 3D graphics since registers can handle four 32-bit numbers at a time
SSE Enhancements • SSE2 • Started in Pentium 4 • Has 144 new instructions (since SSE) • Data width now 64 bits • SSE3 • 13 additional SIMD instructions (since SSE2) • New instructions primarily designed to improve thread synchronization and specific application areas such as media and gaming • Supplemental SSE3 (Core 2) SSE4
Other CPU Innovations • Data width • Internal: How many bits can the CPU process simultaneously? • External: How many bits can the CPU receive simultaneously for processing • Superscalar architecture • Superpipelined architecture Superscalar processing
Intel Corporation • Produced biggest impact on microprocessor technology • Main line of business is CPU but also has other hardware products (e.g. motherboards)
Short History of Intel • 1968: Birth of Intel • Started in memory business • First product was 64-bit memory • 1970s: Increase in market share • Early 1980s: Japanese eats up memory market with 16 - 256 KB chips • 1984: Business slowing down “Get us out of memory!” • 1986: Exited from memory due to success of 80386
Intel Processor Time Line 1982: 286 16-bit processor Optimized Instruction handling 1978: 8086 First 16-bit CPU from Intel 1988: 386SX Cheaper version of the 386DX 2 1979: 8088 Reengineered CPU to fit existing 8-bit hardware 1989: 486 Built in math co-processor L1 cache on-chip 1985: 386 First 32-bit CPU (32-bit system bus) 1971: 4004 Intel’s first microprocessor (108 KHz, 4 bit bus width)
Intel Processor Time Line May 7, 1997: Pentium II (Klamath) 512 KB L2 L1 cache of 32 KB 1993: Pentium Classic Superscalar (5x 486DX-33 MHz) Width of system bus: 64 bit Speed of system bus: 60 to 66 MHz Initially produced a lot of heat 486SX Discount chip No math co-processor Nov 1, 1995: Pentium Pro RISC Processor 32 bit processing L2 cache is built in 3 486DX4 Triple the clock speed From 25 MHz to 75 MHz 33 MHz to 100 MHz Jan 8, 1997: Pentium MMX New set of instructions for multimedia 32 KB L1 cache
Intel Processor Time Line 2000: Pentium 4 7th Generation 0.18 micron technology 1998: Celeron (Mendocino) 333 MHz 128 KB L2 internal cache Jan 26, 1998: Deschutes 333 MHz 0.25 micron technology 1999: Pentim III (Katmai) Enhanced MMX2 graphics instructions Core (2005) 2001: Itanium (formerly Merced) 64-bit CPU 0.18 micron technology > 25 million transistors 1Q 1998: Celeron (Covington) Pentium II without the L2 cache July 26, 1998: Pentium II Xeon 450 MHz Custom SRAM Different L2 caches: 512, 1/2 MB Can have 4 - 8 Xeons in one server 1999: Pentium III Xeon (Tanner)
Current Intel CPU Innovations • Hyperthreading • Multi-core • Core • Core 2 (64-bit architecture)
Intel’s First 64-Bit Chip (Server): Itanium • Was known as IA-64 (but IA-32 compatible) • EPIC (Explicitly Parallel Instruction Computing) processor • Enables up to 20 operations/clock cycle • Employs branch prediction and speculation • Three levels of cache: 2 MB / 4 MB L3 cache, 96K L2 cache, and 32K L1 cache • 128 integer registers, 128 FP registers
Itanium 2 • Available from 1 - 1.66 GHz • Internal L3 cache (1.5 MB, 3 MB, 4 MB, 6 MB, or 9 MB) • System bus: 400/533/667 MHz, 128-bits wide • 0.13 microns, 592 million transistors • Next version (“Montecito”) has 1.72 billion transistors, 26 MB on-die cache, 90 nm
Current Intel CPU Lineup • Mobile • Centrino (Core and Core 2) • Desktop • Core 2 Extreme • Core 2 (now used in Apple Mac Mini) • Servers and workstations • Xeon (now used in Apple Mac Pro) • Itanium 2