1 / 40

Datorarkitektur 1 & Datorsystem 1 – föreläsning 10 onsdag 14 november 2007

Foto: Rona Proudfoot ( some rights reserved ). Datorarkitektur 1 & Datorsystem 1 – föreläsning 10 onsdag 14 november 2007. Datorarkitektur 1 & Datorsystem 1 – föreläsning 10 onsdag 14 november 2007. 1977. 1982. 1982. 1982. 1986. 1986. 42 MB Disk 1.4 MB Floppy 1 MB RAM. 1991. 1996.

kitty
Download Presentation

Datorarkitektur 1 & Datorsystem 1 – föreläsning 10 onsdag 14 november 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Foto: Rona Proudfoot (some rights reserved) Datorarkitektur 1 & Datorsystem 1 – föreläsning 10 onsdag 14 november 2007 Datorarkitektur 1 & Datorsystem 1 – föreläsning 10 onsdag 14 november 2007

  2. 1977

  3. 1982

  4. 1982

  5. 1982

  6. 1986

  7. 1986

  8. 42 MB Disk 1.4 MB Floppy 1 MB RAM 1991

  9. 1996

  10. 2002

  11. 2007

  12. Vad bestämmer om ett program körs snabbt eller långsamt?

  13. Hur stort programmet är... dvs antal rader kod (LOC)... kompilator Antal instruktioner... Hur ofta processorn kan utföra en uppgift  clock cycle time... Beror på hårdvaran! Clock cycles per instruction (CPI) ...

  14. time Clock Cycles ? • Instead of reporting execution time in seconds, we often use cycles • Clock “ticks” indicate when to start activities (one abstraction): • clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) Hur öka prestanda, dvs minska sec/prog? cycle time = seconds per cycle How long is the cycle time for a 4GHz processor?

  15. So, to improve performance (everything else being equal) you can either (increase or decrease?) ________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate. decrease decrease increase

  16. Instruction_count x CPI CPU time = ----------------------------------------------- clock_rate Clock Cycles per Instruction CPU time = Instruction_count x CPI x clock_cycle_time or cycle_time = 1/clock_rate clock_rate = 1/cycle_time Can measure the CPU execution time by running the program. These equations separate the three key factors that affect performance • The clock rate is usually given in the documentation • Can measure instruction count by using profilers/simulators without knowing all of the implementation details • CPI varies by instruction type and ISA implementation for which we must know the implementation details…

  17. 1st instruction 2nd instruction 3rd instruction ... 4th 5th 6th How many cycles are required for a program? • Could assume that number of cycles equals number of instructions Är detta antagande korrekt? time

  18. Different numbers of cycles for different instructions • Multiplication takes more time than addition • Floating point operations take longer than integer ones • Accessing memory takes more time than accessing registers time Changing the cycle time often changes the number of cycles required for various instructions…

  19. Our implementation of the MIPS is simplified • Memory-reference instructions: lw, sw • Arithmetic-logical instructions: add, sub, and, or, slt • Control flow instructions: beq, j Fetch PC = PC+4 • Generic implementation • 1 Use the program counter (PC) to supply the instruction address and fetch the instruction from memory (and update the PC) • 2 Decode the instruction (and read registers) • 3 Execute the instruction (possibly write registers) Exec Decode

  20. How long time to reach a stable state? When can signals be read and written? State element 1 State element 3 Combinational Logic 2 clock Aha! A state element can be read and written in the same clock cycle! one clock cycle Anedge-triggered methodology 1 read contents of state elements 2 send values through combinational logic 3 write results to one or more state elements

  21. Vi börjar bygga en datapath Add 4 Instruction Memory Read Address Instruction 32 bit instruction PC För att hämta en instruktion från minnet läser vi helt enkelt på den plats som PC anger. För att hämta nästa instruktion flyttar vi fram PC 32 bitar, dvs fyra bytes.

  22. Simplicity favors regularity Smaller is faster Design principles... Make the common case fast. Good design demands good compromises

  23. 16 bits 5 bits 5 bits 5 bits 5 bits 5 bits 5 bits 6 bits 6 bits 6 bits op rs rt address/constant op rs rt rd shamt funct 32 bits 32 bits R-type I-type

  24. DRAM Register? SRAM

  25. ALU control MemtoReg ovf zero ALU Sätter ihop fetch... Med R-type (add, sub, and, or, slt...) Och I-type lw/sw Add RegWrite ALUSrc MemWrite 4 Read Addr 1 Instruction Memory Read Data 1 Address Register File Read Addr 2 Data Memory Read Address PC Instruction Read Data Write Addr Read Data 2 Write Data Write Data MemRead Sign Extend 16 32 A simple datapath

  26. beq $t1, $t2 my_label Hur funkar branch-instruktioner...

  27. 5 bits 5 bits 5 bits 5 bits 5 bits 5 bits 6 bits 6 bits 6 bits op rs rt rd shamt funct Write result to this register for add, sub, and, slt, etc R-type Operand-2 register allways here for add, sub, and slt, etc but also for sw (value to store) 16 bits I-type op rs rt address/constant Operand-1 register always here even for lw/sw (base register) Write result to this registerfor lw 26 bits 6 bits op Target address J-type 32 bits OP always here

  28. Single cycle design – fetch, decode and execute each instructions in one clock cycle • No datapath resource can be used more than once per instruction, so some must be duplicated (e.g., separate Instruction Memory and Data Memory, several adders) • Multiplexors needed at the input of shared elements with control lines to do the selection • Write signals to control writing to the Register File and Data Memory • Cycle time is determined by length of the longest path

  29. Single Cycle Datapath with Control Unit 0 Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Multiplexor Instr[5-0]

  30. ovf zero ALU 32 bit 4 bit This is a sub set of all the possible operations ALU control

  31. Sannings-tabell Karnough-diagram Hårdvara

  32. R-type Instruction Data/Control Flow 0 Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

  33. Load Word Instruction Data/Control Flow 0 Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

  34. NOTE: this is a single-cycle implementation Clock Cycle time must be long enough for the longest possible path A god candidate for the longest path? Load Word • And a R-type instruction such as add only uses four functional units • Instruction memory • Register file • Data memory • Register file • Uses five functional units: • Instruction memory • Register file • ALU • Data memory • Register file

  35. Cycle 1 Cycle 2 Clk lw add Waste Single Cycle Disadvantages & Advantages • Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction • especially problematic for more complex instructions like floating point multiply • May be waste of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle but • Is simple and easy to understand

  36. Multicycle Datapath Approach • Let an instruction take more than 1 clock cycle to complete • Break up instructions into steps where each step takes a cycle while trying to • balance the amount of work to be done in each step • restrict each cycle to use only one major functional unit • Not every instruction takes the same number of clock cycles • In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used on different clock cycles, as a result • only need one memory – but only one memory access per cycle • need only one ALU/adder – but only one ALU operation per cycle

  37. IR Address Memory A Read Addr 1 PC Read Data 1 Register File Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data Read Data 2 B Write Data MDR Multicycle Datapath Approach, con’t • At the end of a cycle • Store values needed in a later cycle by the current instruction in an internal register (not visible to the programmer). All (except IR) hold data only between a pair of adjacent clock cycles (no write control signal needed) • IR – Instruction Register MDR – Memory Data Register • A, B – regfile read data registers ALUout – ALU output register • Data used by subsequent instructions are stored in programmer visible registers (i.e., register file, PC, or memory)

  38. MDR The Multicycle Datapath with Control Signals PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 0 1 Address Memory 0 PC 0 Read Addr 1 A Read Data 1 IR Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) 0 ALUout ALU Write Addr Write Data 1 Read Data 2 B 0 1 Write Data 4 1 0 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]

  39. IFetch Exec Mem WB The Five Steps of the Load Instruction • IFetch: Instruction Fetch and Update PC • Dec: Instruction Decode, Register Read, Sign Extend Offset • Exec: Execute R-type; Calculate Memory Address; Branch Comparison; Branch and Jump Completion • Mem: Memory Read; Memory Write Completion; R-type Completion (RegFile write) • WB: Memory Read Completion (RegFile write) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw Dec INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

  40. Hungrig!

More Related