1 / 44

Computer architecture

Computer architecture. Lecture 6: Processor’s structure Piotr Bilski. Procesor’s tasks:. Instruction fetching Instruction interpretation Data fetching Data processing Data saving These justify existence of the registers (temporary memory space). Internal processor’s structure. ALU.

erelah
Download Presentation

Computer architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer architecture Lecture 6: Processor’s structure Piotr Bilski

  2. Procesor’s tasks: • Instruction fetching • Instruction interpretation • Data fetching • Data processing • Data saving These justify existence of the registers (temporary memory space)

  3. Internal processor’s structure ALU Status flags Registers Shifter Complementer Arithmetic and Boolean Logic Control Unit

  4. Block Scheme of Pentium 3 Processor

  5. Block Scheme of P6 Core (Pentium Pro) – 1995 r. • Front-end of the processor • Core • Completion unit

  6. Register types • Accessible for the user (addressing, data etc.) • Inaccessible for the user (control, status) • This categorization is not formal!

  7. Registers accessible by the user • General Purpose Registers (GPR) • Data • Addressing (segment pointer, stack, indexing) • Conditional codes (state pointer, flags) – read-only!

  8. Control and state registers • Basic: • Program Counter (PC) • Instruction Decoding Register (IR) • Memory Address Register (MAR) • Memory Buffer Register (MBR) • Program Status Word (PSW) • Interrupt Vector Register • Page Table Pointer

  9. Program Status Word 0 3 4 15 P R OTHER O I N S Z S – sign bit Z – bit set, if operation result is zero P – carry bit R – logical comparison result bit O – overflow bit I – Enable/disable interrupt execution N – supervisor mode

  10. Registers in the Motorola MC68000 processor • Data and address registers (32-bit) • Specialization: 8 data registers (D0-D7) and 9 address registers (two used interchangeably in the user and supervisor modes) • Control bus 24-bit, data bus 16-bit • A7 register used as a Stack Pointer (SP) • State register (SR)16-bit (another name: CCR) • Program counter (PC) 32-bit • Instructions are stored under even addresses

  11. Registers in the Intel 8086 Processor • 16-bit address and data registers • Data/General Purpose Registers (AX, BX, CX, DX) • Pointer and index registers (SP, BP, SI, DI) • Segment registers (CS, DS, SS, ES) • Instruction pointer • State register

  12. Intel 8086 Registers (cont.) SP BP SI DI Stack pointer AX BX CX DX Accumulator Base pointer Base Source index Counting Displ. ndex Data

  13. Intel 386 - Pentium Processors Registers Organization • 32-bit data and address registers • Eight General Purpose Registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI) • For the backward compatibility, the lower part of the registers are 16-bit registers • 32-bit status register • 32-bit instruction pointer

  14. Floating-point registers of the Pentium processor • Eight 80-bit numerical registers • 16-bit control register • 16-bit state register • 16-bit floating point register content type word • 48-bit instruction pointer • 48-bit data pointer

  15. EFLAGS register • TF – trap flag • IF – interrupt enable flag • DF – direction flag • IOPL – privileged input/output flag • RF – resume flag • AC – alignment control • ID – identification flag 0 15 21 31 ID RF NT IOF OF DF IF TF SF ZF AF VIP VIF AC VM PF CF

  16. Registers in the Athlon 64 processor • Compatibility with x86-64 architecture (40-bit physical address space, 48-bit virtual address space) • Data and address registers 64-bit • 8 general purpose registers (RAX, RBX, RCX, RDX, RBP, RSI, RDI, RSP), work in the 32-bit compatibility mode • Opteron contains additional 8 general purpose registers (R8-R15) • 16 SSE registers (XMM0-XMM15) • 8 floating-point registers x87, 80-bit

  17. Registers in the PowerPC processor • 32 general purpose registers (64-bit) + exception register (XER) • 32 registers for the floating point unit (64-bit) + state and control register (FPSCR) • Branch processing unit registers: 32-bit condition register, 64-bit counting and binding registers

  18. Instruction mode Indirect addressing Argument address calc. Argument fetching Instruction fetch Multiple arguments Multiple results Argument address calc. Data operation Instruction address calc. Writing argument Instructiondecoding No interrupts Return to data Instruction executed, fetch the next one Indirect addressing Interrupts checking Interrupt handling

  19. Instruction fetching cycle Data bus Address bus Control bus Processor PC MAR Memory CU IR MBR

  20. Indirect mode Data bus Address bus Control bus Processor MAR Memory CU MBR

  21. Interrupt mode Data bus Address bus Control bus Processor PC MAR Memory CU MBR

  22. Pipeline • Problem: during the instruction cycle only one instruction is processed • Solution: divide the cycle into smaller fragments • Condition: time instants, when no main memory access is required! Cycle 1 Cycle 2 Cycle 3

  23. Pipeline example - laundry 3 hours / cycle – 9 hours for all LA DR PA LA DR PA LA DR PA CYCLE 1 CYCLE 2 CYCLE 3 3 hours / cycle – 5 hours for all !! LA DR PA LA DR PA LA DR PA

  24. Prefetch • NOTE: acceleration is smaller than double, as the memory access lasts longer than the instruction execution Instruction Instruction Result Instruction fetch Execution New address Waiting Waiting Instruction Instruction Result Instruction fetching Execution Denial

  25. Basic phases of the instruction cycle: • Instruction fetching (FI) • Instruction decoding (DI) • Operands calculation (CO) • Operands fetching (FO) • Instruction execution (EI) • Writing outcome (WO) 1 2 3 4 5 6 7 8 9 10 11 FI DI CO FO EI WO I1 I2 I3 I4 FI DI CO FO EI WO FI DI CO FO EI WO FI DI CO FO EI WO

  26. Branches and pipelining 1 2 3 4 5 6 7 8 9 10 11 12 13 FI DI CO FO EI WO I1 I2 I3 I4 I5 I6 I21 I22 FI DI CO FO EI WO FI DI CO FO FI DI CO FI DI FI FI DI CO FO EI WO FI DI CO FO EI WO

  27. Pipeline implementation algorithm

  28. Problems of the pipelining • Subsequent pipe phases don’t last the same amount of time • Transferring data between the buffers may significantly increase pipeline execution time • Dependency between the registers and memory in the pipeline optimization may be minimized with high stakes

  29. Efficiency of the pipelining Cycle execution time: Time required to execute all the instructions: Instruction pipeline acceleration ratio:

  30. Example of the pipeline efficiency

  31. Modern Processors Pipelines • Pentium 3 – 10 stages • Athlon – 10 stages for ALU, 15 stages for FPU • Pentium M – 12 stages • Athlon 64/ 64 X2 – 12 stages for ALU, 17 stages for FPU • Pentium 4 Northwood – 20 stages (hyperpipeline!!) • Pentium 4 Prescott – 31 stages • Core2Duo – 14 stages

  32. Hazards • They are pipelining disturbances • There are data, resources and control hazards

  33. Branch handling • Pipeline multiplication • Prefetch of the instruction • Loop buffer • Branch prediction • Delayed branch

  34. Multiplied pipelining • Both instructions for simultaneous processing as a result of branch are loaded into two pipelines • The main problem is to gain memory access for both instructions

  35. Prefetch and loop buffer Prefetch • When branch instruction is decoded, the target instruction is fetched. It is stored until the branch is executed Loop buffer • A buffer in memory to store the subsequent instructions is created • It is useful when there are conditional branch instructions and loops involved

  36. Conditional Branch Prediction • Static • Never occuring branch (Sun SPARC, MIPS) • Always occuring branch • Operation code prediction • Dynamic • Occured/Didn’t occur switch • Branch history table

  37. Static prediction • The simplest, used as the fallback method, for instance in the Motorola MPC7450 processor • Pentium 4 allowed inserting the code suggesting if the static prediction should point at the branch or not (so-called prediction hint)

  38. Dynamic prediction of the conditional branches • A conditional branch instruction history is stored • It is represented by the bits stored in the cache memory • Every instruction has its own history bits • Another solution is the table storing informations about the conditional branch result

  39. History bits prediction

  40. Branch history table

  41. Local Branch Prediction • Requires a separate history buffer for each instruction, although the history table can be common for all instructions • Pentium MMX, Pentium 2 i 3 processors have local prediction circuits with 4 history bits and 16 positions for every type of instruction • Local prediction efficiency is estimated at 97 %

  42. Global Branch Prediction • A common history for all branch instructions is stored in memory. It allows to consider dependencies between different branch instructions • Rarely a better solution than the local prediction • Hybrid solutions: shared unit of the global prediction and the history table (AMD processors, Pentium M, Core, Core 2)

  43. Branch Prediction Unit • A processor circuit responsible for prediction of the disturbances in the sequential code execution • Often connected with the microoperation cache memory • In Pentium 4 processor, the buffer for the branch prediction has 4096, in Pentium 3 – only 512. Therefore the former has a 33 percent better hit ratio than the latter

  44. Location of the Branch Prediction Unit

More Related