320 likes | 685 Views
Superscalar Microprocessors. Robert Hock 4/23/02. Superscalar Microprocessors. Topics Covered Superscalar Processor Overview MIPS R10000 Intel IA32 PowerPC. What does superscalar mean?. Definition:
E N D
Superscalar Microprocessors • Robert Hock 4/23/02
Superscalar Microprocessors • Topics Covered • Superscalar Processor Overview • MIPS R10000 • Intel IA32 • PowerPC
What does superscalar mean? • Definition: • Superscalar machines are able to issue multiple instructions for each clock cycle from a conventional linear instruction stream
In English This Time • A superscalar processor can run code out of sequence in order to optimize it. Instructions of various lengths introduce latency into the program execution. By piplining these instructions, it is possible to execute multiple instructions out of sync.
How Does it Work? • Instructions are introduced in sequence • These instructions are scheduled dynamically by the hardware • More than one instruction can be issued each clock cycle • The number of instructions issued is also set dynamically by the hardware
Phases of the Superscalar Pipeline • Fetch • Pre-fetch • Decode • Rename • Issue • Execute • Complete • Reorder • Commit • Retire • Write-Back
Fetch & Decode • Fetching & Decoding can be done faster than Execution • Processor Fetches & Decodes more instructions than it Commits, because it discards instructions from mispredicted branch paths
Pre-Fetch & Pre-Decoding • Pre-Decoding is done when instructions are transferred from memory to the cache • The Pre-Decoded instruction is more simple than the original • The Decoder can decode this format faster than the original
Renaming • Renaming is the process of giving physical registers to take the place of logical registers
Issue • Waiting instructions are analyzed to find instructions beyond the current instructions that can be executed independantly • This is “Look-Ahead” capability • Instructions can be issued in-order or out-of-order
Execute • Instruction is Executed in either a single cycle, or may take multiple cycles • After Execution, the Completion phase is reached
Reorder • The Reorder logic sorts whether the instruction was on a predictive branch, and whether that branch was correct • Execution exceptions are marked
Commit • An executed instruction is committed when: • All previous instructions required by the program have already been committed • No interrupt has occurred • If instruction was executed from a branch prediction and the branch was correct
Retire • An instruction is Retired when: • The instruction has been committed • The instruction has been removed because of branch prediction or exception
Write-Back • As the name implies, final instruction data is written back
MIPS R10000 Overview • 64-bit instruction set • Can decode 4 instructions per cycle • Has 5 execution pipelines • Uses dynamic scheduling and out-of-order execution • Does speculative branching
R10000 Functional Units • Integer ALU1 • Integer ALU2 • Load/Store Unit • Float Adder • Float Multiply
R10000 Pipeline Stages • Stage 1 • Fetch 4 Instructions per cycle • Stage 2 • 4 Instructions are Decoded & Renamed • Only 1 Branch Instruction can be decoded per cycle • Stage 3 • Decoded Instructions Issued
R10000 Pipeline Stages(cont) • Stages 4-6 (dependant on instruction) • Float Multiply (3 stage pipeline) • Float Adder (3 stage pipeline) • Integer ALU1 (1 stage pipeline) • Integer ALU2 (1 stage pipeline)
Intel IA-32 Overview • 32-bit instruction set. • 3-Way Pipelined • 12 stage pipeline • “Optimized” Scheduling, that necessitates retiring instructions in linear order
IA-32 Functional Units • Integer • Float • Load • Store1 • Store2 • Jump • MMX (Multimedia Instructions)
IA-32 Pipeline Stages • Stages 1-5 • Fetch and Predecode • Stages 6&7 • Decode • Stage 8 • Renaming
IA-32 Pipeline (cont) • Stages 9&10 • Issue • Stage 11 • Execution • Stage 12 • Retirement
IA-32 Latencies • Integer Arithmetic – 1 • Integer Mult – 4 • Float Add – 3 • Float Mult – 5 • Load & Store – 3 • MMX Arithmetic –1 • MMX Mult – 3
PowerPC 750 Overview • 64-bit RISC Processor • 32-bit addressing
Functional Units • Float (3 Stage Pipeline) • Branch • Load/Store • Single Cycle Integer • Multi Cycle Integer
PowerPC Pipeline • Fetch • Issue • Integer OP (+3 Depth) • Load OP (+7 Depth) • Store OP (+5 Depth) • Float OP (+6 Depth)
Conclusion • While the R10000 and PowerPC are truly RISC based, the IA-32 has its roots in the CISC world. • The IA-32 has a deeper pipeline, allowing for increased clock cycles, which allows for increased sales. This is despite the fact that it delivers only mediocre performance.
Conclusion (cont) • For intensive numerical computation and 3D rendering the MIPS R10000 is superior • For everyday applications that would require low-voltage/heat, the PowerPC line has an edge. • For the home user, the IA-32 will be sufficient until the AMD 64-bit Hammer line is introduced.
For More Information • http://www.mips.com • http://www.intel.com • http://www.ibm.com • http://e-www.motorola.com/