Lec 8: Pipelining

Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

Basic Pipelining Five stage “RISC” load-store architecture • Instruction fetch (IF) • get instruction from memory • Instruction Decode (ID) • translate opcode into control signals and read regs • Execute (EX) • perform ALU operation • Memory (MEM) • Access memory if load/store • Writeback (WB) • update register file Following slides thanks to Sally McKee

Pipelined Implementation • Break the execution of the instruction into cycles (five, in this case) • Design a separate stage for the execution performed during each cycle • Build pipeline registers (latches) to communicate between the stages Slides thanks to Sally McKee

Stage1: Fetch and Decode • Design a datapath that can fetch an instruction from memory every cycle • Use PC to index memory to read instruction • Increment the PC (assume no branches for now) • Write everything needed to complete execution to the pipeline register (IF/ID) • The next stage will read this pipeline register • Note that pipeline register must be edge triggered Slides thanks to Sally McKee

M U X 1 PC + 1 incr Instruction bits IF / ID Pipeline register Rest of pipelined datapath Instruction Memory/ Cache PC en en Slides thanks to Sally McKee

Stage2: Decode • Reads the IF/ID pipeline register, decodes instruction, and reads register file (specified by regA and regB of instruction bits) • Decode can be easy, just pass on the opcode and let later stages figure out their own control signals for the instruction • Write everything needed to complete execution to the pipeline register (ID/EX) • Pass on the offset field and destination register specifiers (or simply pass on the whole instruction!) • Pass on PC+1 even though decode didn’t use it

PC + 1 PC + 1 Contents Of regA Destreg Contents Of regB Data Instruction bits Control Signals ID / EX Pipeline register IF / ID Pipeline register regA Register File regB Rest of pipelined datapath Stage 1: Inst Fetch datapath en Slides thanks to Sally McKee

Stage3: Execute • Design a datapath that performs the proper ALU operation for the instruction specified and values present in the ID/EX pipeline register • The inputs are the contents of regA and either the contents of regB or the offset field in the instruction • Also, calculate PC+1+offset, in case this is a branch • Write everything needed to complete execution to the pipeline register (EX/Mem) • ALU result, contents of regB and PC+1+offset • Instruction bits for opcode and destReg specifiers

Alu Result Contents Of regA Rest of pipelined datapath Contents Of regB M U X contents of regB A L U Control Signals ID / EX Pipeline register EX/Mem Pipeline register PC + 1 + offset Magic PC + 1 Stage 2: Decode datapath Control Signals Slides thanks to Sally McKee

Stage4: Memory Operation • Design a datapath that performs the proper memory operation for the instruction specified and values present in the EX/Mem pipeline register • ALU result contains address for ld and st instructions • Opcode bits control memory R/W and enable signals • Write everything needed to complete execution to the pipeline register (Mem/WB) • ALU result and MemData • Instruction bits for opcode and destReg specifiers

This goes back to the MUX before the PC in stage 1 MUX control for PC input Alu Result Alu Result Rest of pipelined datapath Control Signals Mem/WB Pipeline register EX/Mem Pipeline register PC+1 +offset Data Memory Stage 3: Execute datapath contents of regB Memory Read Data en R/W Control Signals Slides thanks to Sally McKee

Stage5: Write Back • Design a datapath that completes the execution of this instruction, writing to the register file if required • Write MemData to destReg for ld instruction • Write ALU result to destReg for arithmetic/logic instructions • Opcode bits also control register write enable signal Slides thanks to Sally McKee

This goes back to data input of register file M U X bits 0-2 This goes back to the destination register specifier M U X bits 15-17 register write enable Alu Result Memory Read Data Stage 4: Memory datapath Control Signals Mem/WB Pipeline register Slides thanks to Sally McKee

Sample Code (Simple) • Assume eight-register machine • Run the following code on a pipelined datapath add 3 1 2 ; reg 3 = reg 1 + reg 2 nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add 5 2 5 ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

+ A L U M U X 1 target PC+1 PC+1 0 R0 R1 regA ALU result R2 Register file regB valA M U X PC Inst mem Data mem instruction R3 ALU result mdata R4 valB R5 R6 M U X data R7 offset dest valB Bits 0-2 dest dest dest Bits 15-17 M U X Bits 21-23 op op op IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

+ A L U M U X 1 0 0 0 0 R0 0 36 R1 0 9 R2 Register file 0 M U X PC Inst mem Data mem nop 12 R3 0 0 18 R4 7 0 R5 41 R6 M U X data 22 R7 0 dest 0 Initial State Bits 0-2 0 0 0 Bits 15-17 M U X Bits 21-23 nop nop nop IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

+ A L U add 3 1 2 M U X 1 0 1 0 0 R0 0 36 R1 0 9 R2 Register file 0 M U X PC Inst mem Data mem add 3 1 2 12 R3 0 0 18 R4 7 0 R5 41 R6 M U X data 22 R7 0 dest 0 Fetch: add 3 1 2 Bits 0-2 0 0 0 Bits 15-17 M U X Bits 21-23 nop nop nop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 1 Slides thanks to Sally McKee

+ A L U nand 6 4 5 add 3 1 2 M U X 1 0 2 1 0 R0 0 36 R1 1 0 9 R2 Register file 2 36 M U X PC Inst mem Data mem nand 6 4 5 12 R3 0 0 18 R4 7 9 R5 41 R6 M U X data 22 R7 3 dest 0 Fetch: nand 6 4 5 Bits 0-2 3 0 0 Bits 15-17 M U X Bits 21-23 add nop nop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 2 Slides thanks to Sally McKee

+ A L U lw 4 20(2) nand 6 4 5 add 3 1 2 M U X 1 4 3 2 0 R0 0 36 R1 4 0 36 9 R2 Register file 5 18 M U X PC Inst mem Data mem lw 4 20(2) 12 R3 45 0 18 R4 9 7 7 R5 41 R6 M U X data 22 R7 6 dest 9 Fetch: lw 4 20(2) Bits 0-2 3 6 3 0 Bits 15-17 M U X Bits 21-23 nand add nop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 3 Slides thanks to Sally McKee

+ A L U add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2 M U X 1 8 4 3 0 R0 0 36 R1 2 45 18 9 R2 Register file 4 9 M U X PC Inst mem Data mem add 5 2 5 12 R3 -3 0 18 R4 45 7 7 18 R5 41 R6 M U X data 22 R7 20 dest 7 Fetch: add 5 2 5 Bits 0-2 3 6 4 6 3 Bits 15-17 M U X Bits 21-23 lw nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 4 Slides thanks to Sally McKee

+ A L U sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 4 5 6 add M U X 1 23 5 4 0 R0 0 45 36 R1 2 -3 9 9 R2 Register file 5 9 M U X PC Inst mem Data mem sw 7 12(3) 45 R3 29 0 18 R4 -3 7 7 R5 41 R6 M U X data 22 R7 20 5 dest 18 Fetch: sw 7 12(3) Bits 0-2 6 3 4 5 4 6 Bits 15-17 M U X Bits 21-23 add lw nand IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 5 Slides thanks to Sally McKee

+ A L U sw 7 12(3) add 5 2 5 lw 4 20(2) nand M U X 1 9 5 0 R0 0 -3 36 R1 3 29 9 9 R2 Register file 7 45 M U X PC Inst mem Data mem 45 R3 16 99 18 R4 29 7 7 22 R5 -3 R6 M U X data 22 R7 12 dest 7 No more instructions Bits 0-2 4 6 5 7 5 4 Bits 15-17 M U X Bits 21-23 sw add lw IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 6 Slides thanks to Sally McKee

+ A L U sw 7 12(3) add 5 2 5 lw M U X 1 15 0 R0 0 36 R1 16 45 9 R2 Register file M U X PC Inst mem Data mem 45 R3 99 57 0 99 R4 16 7 R5 -3 R6 M U X data 22 R7 12 dest 22 No more instructions Bits 0-2 5 4 7 7 5 Bits 15-17 M U X Bits 21-23 sw add IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 7 Slides thanks to Sally McKee

+ A L U sw 7 12(3) add M U X 1 0 R0 16 36 R1 57 9 R2 Register file M U X PC Inst mem Data mem 45 R3 0 99 22 R4 57 16 R5 -3 R6 M U X data 22 R7 dest 22 No more instructions Bits 0-2 5 7 Bits 15-17 M U X Bits 21-23 sw IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 8 Slides thanks to Sally McKee

+ A L U sw M U X 1 0 R0 36 R1 9 R2 Register file M U X PC Inst mem Data mem 45 R3 99 R4 16 R5 -3 R6 M U X data 22 R7 dest No more instructions Bits 0-2 Bits 15-17 M U X Bits 21-23 IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 9 Slides thanks to Sally McKee

Time Graphs Time: 1 2 3 4 5 6 7 8 9 add nand lw add sw fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback Slides thanks to Sally McKee

Pipelining Recap • Powerful technique for masking latencies • Logically, instructions execute one at a time • Physically, instructions execute in parallel • Instruction level parallelism • Decouples the processor model from the implementation • Interface vs. implementation • BUT dependencies between instructions complicate the implementation

Lec 8: Pipelining

Lec 8: Pipelining

Presentation Transcript

Pipelining

A Modular Data Pipelining Architecture (MDPA) for enabling Universal Accessibility in P2P Grids

Performing impossible feats of XML processing with pipelining

Introduction to Advanced Pipelining

Pipelining and Vector Processing

G é rard: The Hard Man behind the Soft Core

Chapter 5 Overview

A Tutorial on High Performance Computing Taxonomy

Instruction Set Architecture & Pipelining

Pipelining: Basic and Intermediate Concepts

Chap.6: Enhancing Performance with Pipelining

March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

September 17 th , 2003 Prof. John Kubiatowicz

UNIT-III PIPELINING AND I/O ORGANISATION

CS4100: 計算機結構 Pipelining

Chapter 8

16.482 / 16.561 Computer Architecture and Design

Chapter 4

The Pipelined CPU

Lec 8: Pipelining

Lec 8: Pipelining

Presentation Transcript

Pipelining

A Modular Data Pipelining Architecture (MDPA) for enabling Universal Accessibility in P2P Grids

Performing impossible feats of XML processing with pipelining

Introduction to Advanced Pipelining

Pipelining and Vector Processing

G é rard: The Hard Man behind the Soft Core

Chapter 5 Overview

A Tutorial on High Performance Computing Taxonomy

Instruction Set Architecture &amp; Pipelining

Pipelining: Basic and Intermediate Concepts

Chap.6: Enhancing Performance with Pipelining

March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

September 17 th , 2003 Prof. John Kubiatowicz

UNIT-III PIPELINING AND I/O ORGANISATION

CS4100: 計算機結構 Pipelining

Chapter 8

16.482 / 16.561 Computer Architecture and Design

Chapter 4

The Pipelined CPU

Instruction Set Architecture & Pipelining