390 likes | 645 Views
Intel ® Core 2 Duo Desktop Processor Architecture. CMPE 511 Computer Architecture Caner AKSOY CmpE Boğaziçi University December 2006. What’s next?. History Intel Core 2 Duo Intel Core 2 Microarchitecture Intel Core 2 Models Architectural Features of Core 2
E N D
Intel® Core 2 Duo Desktop Processor Architecture CMPE 511 Computer Architecture Caner AKSOY CmpE Boğaziçi University December 2006
What’s next? • History • Intel Core 2 Duo • Intel Core 2 Microarchitecture • Intel Core 2 Models • Architectural Features of Core 2 • What is an instruction set? • SSSE3 (x86) • Execute Disable Bit • Intel® Wide Dynamic Execution • 14 Stage pipeline • MacroFusion • Micro-op Fusion • What is L1 and L2? • Intel®Advanced Smart Cache • Intel®Smart Memory Access • Intel®Advanced Digital Media Boost
History(List of Intel microprocessors) • The 4-bit processors 4004, 4040 • The 8-bit processors 8008, 8080, 8085 • The 16-bit processors: Origin of x86 8086, 8088, 80186, 80188, 80286 • The 32-bit processors: Non x86 iAPX 432, 80960, 80860, XScale • The 32-bit processors: The 80386 Range 80386DX, 80386SX, 80376, 80386SL, 80386EX • The 32-bit processors: The 80486 Range 80486DX, 80486SX, 80486DX2, 80486SL, 80486DX4 • The 32-bit processors: The Pentium (“I”) Pentium, Pentium MMX • The 32-bit processors: P6/Pentium M Pentium Pro, Pentium II, Celeron, Pentium III, PII and III Xeon Celeron(PIII), Pentium M, Celeron M, Intel Core, Dual Core Xeon LV • The 32-bit processors: NetBurst microarchitecture Pentium 4, Xeon, Pentium 4 EE • The 64-bit processors: IA-64 Itanium, Itanium 2 • The 64-bit processors: EM64T-NetBurst Pentium D, Pentium Extreme Edition, Xeon • The 64-bit processors: EM64T- Core microarchitecture Xeon, Intel Core 2
Intel Core 2 Duo 4 / 37
Intel Core 2 Microarchitecture Woodcrest Intel® Wide Dynamic Execution Intel® Intelligent Power Capability Intel® Advanced Smart Cache Intel® Smart Memory Access Intel®Advanced Digital Media Boost Server Optimized Conroe Desktop Optimized 65nm Merom Mobile Optimized 5 / 37
Intel Core 2 models • Allendale, Conroe- 65 nm process technology • Desktop CPU • Introduced on July 27, 2006 • Number of Transistors 291 Million on 4 MB Models • Number of Transistors 167 Million on 2 MB Models • Variants • Core 2 Duo E6700 - 2.67 GHz (4 MB L2, 1066 MHz FSB) • Core 2 Duo E6600 - 2.40 GHz (4 MB L2, 1066 MHz FSB) • Core 2 Duo E6400 - 2.13 GHz (2 MB L2, 1066 MHz FSB) • Core 2 Duo E6300 - 1.86 GHz (2 MB L2, 1066 MHz FSB) • Core 2 Duo E4200 - 1.60 GHz (2 MB L2, 800 MHz FSB) 6 / 37
Intel Core 2 models • Woodcrest- 65 nm process technology • Server optimized CPU • Introduced on July 26, 2006 • Same features as Conroe • Variants • Xeon 5160 - 3.00 GHz (4 MB L2, 1333 MHz FSB, 80 W) • Xeon 5150 - 2.66 GHz (4 MB L2, 1333 MHz FSB, 65 W) • Xeon 5140 - 2.33 GHz (4 MB L2, 1333 MHz FSB, 65 W) • Xeon 5130 - 2.00 GHz (4 MB L2, 1333 MHz FSB, 65 W) • Xeon 5120 - 1.86 GHz (4 MB L2, 1066 MHz FSB, 65 W) • Xeon 5110 - 1.60 GHz (4 MB L2, 1066 MHz FSB, 65 W) • Xeon 5148LV - 2.33 GHz (4 MB L2,1333 MHz FSB,40 W) 7 / 37
Intel Core 2 models • Merom- 65 nm process technology • Mobile CPU • Introduced on July 27, 2006 • Same features as Conroe • Variants • Core 2 Duo T7600 - 2.33 GHz (4 MB L2, 667 MHz FSB) • Core 2 Duo T7400 - 2.16 GHz (4 MB L2, 667 MHz FSB) • Core 2 Duo T7200 - 2.00 GHz (4 MB L2, 667 MHz FSB) • Core 2 Duo T5600 - 1.83 GHz (2 MB L2, 667 MHz FSB) • Core 2 Duo T5500 - 1.66 GHz (2 MB L2, 667 MHz FSB) • Core 2 Duo T5200 - 1.60 GHz (2 MB L2, 533 MHz FSB) 8 / 37
Architectural Features of Core 2 • SSSE3 SIMD instructions • Intel Virtualization Technology, multiple OS support • LaGrande Technology, enhanced security hardware extensions • Execute Disable Bit • EIST (Enhanced Intel SpeedStep Technology) • Intel Wide Dynamic Execution • Intel Intelligent Power Capability • Intel Advanced Smart Cache • Intel Smart Memory Access • Intel Advanced Digital Media Boost 9 / 37
What is an instruction set? • All instructions, and all their variations, that a processor can execute • Types: • Arithmetic such as add and subtract • Logic instructions such as and, or, and not • Data instructions such as move, input, output, load, and store • Part of the computer architecture • Distinguished from the microarchitecture • Different microarchitectures can share common instruction set while their internal designs differ Fetch Decode Operand Fetch Execute Retire 10 / 37
SSSE3 (x86)Supplemental Streaming SIMD Extension 3 • Intel's name for the SSE instruction set's fourth iteration • Single Instruction Multiple Data instruction set • A revision of SSE3 • CPUs with SSSE3 • Xeon 5100 series • Intel Core 2 • Development • Faster permutationof bytes • Multiplying 16-bit fixed-point numbers with correct rounding • Better word accumulation 11 / 37
SSSE3 (x86)Supplemental Streaming SIMD Extension 3 • 16 New instructions • PSIGNB, PSIGNW, PSIGND • Packed Sign • PABSB, PABSW, PABSD • Packed Absolute Value • PALIGNR • Packed Align Right • PSHUFB • Packed Shuffle Bytes • PMULHRSW • Packed Multiply High with Round and Scale • PMADDUBSW • Multiply and Add Packed Signed and Unsigned Bytes • PHSUBW, PHSUBD • Packed Horizontal Subtract (Words or Doublewords) • PHSUBSW • Packed Horizontal Subtract and Saturate Words • PHADDW, PHADDD • Packed Horizontal Add (Words or Doublewords) • PHADDSW • Packed Horizontal Add and Saturate Words 12 / 37
Execute Disable Bit • Problem • Buffer overflow attacks of malicious software • Must be combined with a supporting operating system • Classifies areas in memory for protection • Disables code execution on an attack • Decreases the need for software patches and antivirus software 13 / 37
Intel® Wide Dynamic Execution L2CACHE Performance increases while energy consumption decreases • AdvantageWider executionComprehensive AdvancementsEnabled in each coreEach core fetches, dispatches, executes and returns up to four full instructions simultaneously. 14 / 37 Branch – Add – Mul – Load - Store
14 Stage pipeline • Pentium D has 31 stage pipeline • AMD Athlon 64 has 12 stage pipeline • A question for the class: • Why didn’t Intel increase the pipeline after a 31 stage experience with Pentium D? 15 / 37
14 Stage pipeline • Pentium D has 31 stage pipeline • AMD Athlon 64 has 12 stage pipeline • A question for the class: • Why didn’t Intel increase the pipeline after a 31 stage experience with Pentium D? Bubble of non-work Jump! ……………… I100 I99 I3 I2 I1 16 / 37
MacroFusion • If (myVariable == myConstant) doThis(); Else doThat(); Compare instruction Jump instructions + = Compare Jump microOp 17 / 37
Micro-op Fusion Example: Load the contents of [mem] into a register (MOV EBX, [mem]) An ALU operation, ADD the two registers together (ADD EBX, EAX) Store the result back to memory (MOV [mem], EBX) • The micro-ops which are derived from the same macro-op are fused to reduce the number of micro-ops that need to be executed. • Gaining from the number of instruction to be executed. • Power consumption • Better scheduling • Reduces the number of micro-ops which are handled by the out-of-order logic. 18 / 37
What is L1 and L2? • Level-1 and Level-2 caches • The cache memories in a computer • Much faster than RAM • L1 is built on the microprocessor chi itself. • L2 is a seperate chip • L2 cache is much larger than L1 cache 19 / 37
Intel®Advanced Smart Cache Decreased traffic Higher cache hit rateReduced bus trafficLower latency to data • Advantage L2 cache is shared equallyData stored in one placeOptimizes cache resourceUp to 100% utilization of L2 cache Increased traffic 20 / 37
Intel®Smart Memory Access 21 / 37
Intel®Smart Memory Access 22 / 37
Intel®Smart Memory Access 23 / 37
Intel®Smart Memory Access 24 / 37
Intel®Smart Memory Access 25 / 37
Intel®Smart Memory Access 26 / 37
Intel®Smart Memory Access 27 / 37
Intel®Smart Memory Access 28 / 37
Intel®Smart Memory Access 29 / 37
Intel®Smart Memory Access 30 / 37
Intel®Smart Memory Access 31 / 37
Intel®Smart Memory Access 32 / 37
Intel®Smart Memory Access 33 / 37
Intel®Smart Memory Access • Why? • Lost opportunities for out-of-order execution. • What is the idea? • Ignore the store-load dependecies • If there is a dependency, flash the load instruction • How is it checked? • Verify by checking all dispatched store addresses in the memory order buffer • There is a watchdog 34 / 37
Intel®Advanced Digital Media Boost Lower 64 bit in one cycle, upper in the next 35 / 37
Intel®Advanced Digital Media Boost 128 bit instruction completed in one cycle 36 / 37
Intel®Advanced Digital Media Boost • Improves performance when executing SSE instructions • 128 bit SIMD integer arithmetic • 128 bit SIMD double precision floating point • Accelerate a broad range of applications • Video, speech, imageprocessing • Encryption • Financial • Engineering and scientific 37 / 37
References • [1] http://en.wikipedia.org/wiki/List_of_Intel_microprocessors • [2] http://en.wikipedia.org/wiki/SSSE3 • [3] http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2748 • [4] http://en.wikipedia.org/wiki/Instruction_set • [5] http://download.intel.com/technology/architecture/new_architecture_06.pdf • [6] http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2748&p=3 • [7] http://searchsmb.techtarget.com/sDefinition/0,,sid44_gci212451,00.html • [8] http://www.intel.com/cd/products/services/emea/tur/processors/287176.htm • [9] http://techreport.com/reviews/2006q3/core2/index.x?pg=1