CS 230: Computer Organization and Assembly Language

CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics Arizona State University Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB

Announcements • Alternate Project • Submit Nov 24 • Quiz 5 • Thursday, Nov 19, 2009 • Pipelining • Finals • Tuesday, Dec 08, 2009 • Please come on time (You’ll need all the time) • Open book, notes, and internet • No communication with any other human

Benefits of Pipelining Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Dec/Reg • Exec Mem • Wr Ifetch Dec/Reg • Exec Mem • Wr Ifetch Dec/Reg • Exec Mem • Wr pipelined Ifetch Dec/Reg • Exec Mem • Wr • Pipeline latches: pass the status and result of the current instruction to next stage • Comparison: Clock Single- cycle inst. Dec/Reg • Exec Ifetch Mem Ifetch sw lw

Branch Hazards • So far, we’ve limited discussion of hazards to: • Arithmetic/logic operations • Data transfers • Also need to consider hazards involving branches: • Example: • 40: beq $1, $3, 28 • 44: and $12, $2, $5 • 48: or $13, $6, $2 • 52: add $14, $2, $2 • 72: lw $4, 50($7) • How long will it take before the branch decision takes effect? • What happens in the meantime?

Branch signal determined in MEM stage PCSrc ID/EX M u WB EX/MEM x M MEM/WB Control WB ALUOp EX M WB IF/ID e d ALUSrc t h a i r g c e W n e R a R m m r o e B e e t t Add i M m r M W e M g Add e Shift R left 2 Read t s reg 1 D Read PC g e address Read R Read data 1 Read reg 2 Zero addr Read Write Read M data Instruction reg data 2 u ALU Write M Memory x u addr Write x data Write Data data Memory Inst[15-0] Sign ALU extend control Inst[20-16] M u Inst[15-11] x Registers

Pipeline impact on branch • If branch condition true, must skip 44, 48, 52 • But, these have already started down the pipeline • They will complete unless we do something about it • How do we deal with this? • We’ll consider 2 possibilities

Dealing w/branch hazards: always stall • Branch taken • Wait 3 cycles • No proper instructions in the pipeline • Same delay as without stalls (no time lost)

Dealing w/branch hazards: always stall • Branch not taken • Still must wait 3 cycles • Time lost • Could have spent cycles fetching and decoding next instructions

Assume branch not taken • On average, branches are taken ½ the time • If branch not taken… • Continue normal processing • Else, if branch is taken… • Need to flush improper instruction from pipeline • Cuts overall time for branch processing in ½

Flushing unwanted instructions from pipeline • Useful to compare w/stalling pipeline: • Simple stall: inject bubble into pipe at ID stage only • Change control to 0 in the ID stage • Let “bubbles” percolate to the right • Flushing pipe: must change inst. In IF, ID, and EX • IF Stage: • Zero instruction field of IF/ID pipeline register • Use new control signal IF.Flush • ID Stage: • Use existing “bubble injection” mux that zeros control for stalls • Signal ID.Flush is ORed w/stall signal from hazard detection unit • EX Stage: • Add new muxes to zero EX pipeline register control lines • Both muxes controlled by single EX.Flush signal • Control determines when to flush: • Depends on Opcode and value of branch condition

Flushing Pipeline IF.Flush EX.Flush Flush Pipeline Hazard ID.Flush Detection Unit ID/EX 0 M WB EX/MEM u M x M MEM/WB u Control WB x 0 EX M M WB IF/ID u x 0 PC Branch Decision

Assume “branch not taken”…and branch is not taken… • Execution proceeds normally – no penalty

Assume “branch not taken”…and branch is taken… • Bubbles injected into 3 stages during cycle 5

Reservation Table Picture Assume Branch Not Taken and Correct 40: beq $1, $3, 72 44: and $12, $2, $5 48: or $13, $6, $2 52: add $14, $2, $2 … 72: lw $4, 50($7) • Another way of looking at it… No penalty 3 cycle penalty Assume Branch Not Taken and NOT Correct (FYI, branch Freq ~ 20%; & 3 cycle penalty 50% of time)

Branch Penalty Impact • Assume 16% of all instructions are branches • 4% unconditional branches: 3 cycle penalty • 12% conditional: 50% taken • For a sequence of N instructions (assume N is large) • N cycles to initiate each • 3 * 0.04 * N delays due to unconditional branches • 0.5 * 3 * 0.12 * N delays due to conditional taken • Also, an extra 4 cycles for pipeline to empty • Total: • 1.3*N + 4 total cycles (or 1.3 cycles/instruction) (CPI) • 30% Performance Hit!!! (Bad thing)

Branch Penalty Impact • Some solutions: • In ISA: branches always execute next 1 or 2 instructions • Instruction so executed said to be in delay slot • See SPARC ISA • (example – loop counter update) • In organization: move comparator to ID stage and decide in the ID stage • Reduces branch delay by 2 cycles • Increases the cycle time

Branch Prediction • Prior solutions are “ugly” • Better (& more common): guess in IF stage • Technique is called “branch predicting”; needs 2 parts: • “Predictor” to guess where/if instruction will branch (and to where) • “Recovery Mechanism”: i.e. a way to fix your mistake • Prior strategy: • Predictor: always guess branch never taken • Recovery: flush instructions if branch taken • Alternative: accumulate info. in IF stage as to… • Whether or not for any particular PC value a branch was taken next • To where it is taken • How to update with information from later stages

A Branch Predictor

Branch History Table

Branch Prediction Information • One bit predictor: • Use result from last time we saw this instruction • Problem: • Even if branch is almost always taken, we will be wrong at least twice • 1st time we the instruction • 1st time the branch is not taken • Also, 1st time branch is taken again after than • And if branch alternates b/t taken, not taken… • We get 0% accuracy • Can we do better? Yep.

Branch Prediction Information • How to do better? • Keep a “counter” in each entry of the number of times taken in the last N times executed • Keep information about the “pattern” of previous branches • Book’s scheme: a “2-bit saturating counter” • Increment when branch is taken • Decrement when branch is not taken • Don’t increment or decrement above or below a max/min count • Use sign of count as predictor

Book’s 2 Bit Branch Counter

Computing Performance • Program assumptions: • 23% loads and in ½ of cases, next instruction uses load value • 13% stores • 19% conditional branches • 2% unconditional branches • 43% other • Machine Assumptions: • 5 stage pipe with all forwarding • Only penalty is 1 cycle on use of load value immediately after a load) • Jumps are totally resolved in ID stage for a 1 cycle branch penalty • 75% branch prediction accuracy • 1 cycle delay on misprediction

The Answer: • CPI penalty calculation: • Loads: • 50% of the 23% of loads have 1 cycle penalty: .5*.23=0.115 • Jumps: • All of the 2% of jumps have 1 cycle penalty: 0.02*1 = 0.02 • Conditional Branches: • 25% of the 19% are mispredicted for a 1 cycle penalty: 0.25*0.19*1 = 0.0475 • Total Penalty: 0.115 + 0.02 + 0.0475 = 0.1825 • Average CPI: 1 + 0.1825 = 1.1825

Yoda says… Death is a natural part of life. Rejoice for those around you who transform into the Force. Mourn them do not. Miss them do not

CS 230: Computer Organization and Assembly Language

CS 230: Computer Organization and Assembly Language

Presentation Transcript

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing

CSC 317 Computer Organization and Architecture

Assembly Language and Computer Architecture Using C++ and Java

Introduction to Computer Architecture (Section-2)

Solid Edge ST5 Instructor-Led Training Assembly

CSC 211 Data Structures Lecture 4

CSE 5344 Computer Networks

Basic Structure of Computer

EEM 486 : Computer Architecture Lecture 2 MIPS I nstruction Set Architecture

The 8051 Assembly Language

Intel SIMD architecture

OMSE 510: Computing Foundations 3: Caches, Assembly, CPU Overview

55:035 Computer Architecture and Organization

Chapter 6 Assembly Drawings

Assembly Language

INTRODUCTION TO IBM PC ASSEMBLY LANGUAGE

Introduction to 8086 Assembly Language Programming

Computer Organization and Architecture

Computer Organization

Fall 2012

Review of ECE301: Computer Organization