160 likes | 276 Views
CDA 4150 Notes. Pipelining. February 18, 2004 By: Nicholas Bray & San Chong Cheng. a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44. b 11 b 12 b 13 b 14 b 21 b 22 b 23 b 24 b 31 b 32 b 33 b 34 b 41 b 42 b 43 b 44. A =. B =.
E N D
CDA 4150 Notes Pipelining February 18, 2004 By: Nicholas Bray & San Chong Cheng
a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44 b11 b12 b13 b14 b21 b22 b23 b24 b31 b32 b33 b34 b41 b42 b43 b44 A = B = Matrices How do we store A and B in a vector processor so we can access the rows of A and the columns of B?
Solution: M1 M2 M3 M4 Note: this is not the best solution for computing AxB = C. cij = a1j*bi1 + a2j*bi2 + a3j*bi3 + a4j*bi4 So we would need to access akj and bik in parallel. This can be done by switching the rows of memory with the columns (we can access all of Mi in parallel) a11 a21 a31 a41 b11 b12 b13 b14 a12 a22 a32 a42 b21 b22 b23 b24 a13 a23 a33 a43 b31 b32 b33 b34 a14 a24 a34 a44 b41 b42 b43 b44
M1 M2 M3 M4 a11 a24 a33 a42 a12 a21 a34 a43 a13 a22 a31 a44 a14 a23 a32 a41 The matrix can now be accessed in parallel both horizontally and vertically (Review cont.) How should we store the arrays, so that we can access the matrix in parallel (rows&cols of A and B). M1 M2 M3 M4 M5 a11 a34 a43 a12 a21 a44 a13 a22 a31 a14 a23 a32 a41 a24 a33 a42 The matrix can now access diagonally in parallel
Pipelining – Load Store Instruction Fetch Execute
2 Stages Fetch Mar PC MDR M[MAR] Fetch Unit Execute IR MDR PC PC + 1 M1 MAR IR Exec A R1 B R2 ALU A (op) B C ALU (results) R3 C MDR
Further Pipelining Instruction Fetch ID / OF Execute ID / OF: Instruction decode operand fetch
3 Stages Exec ALUop A (op) B C ALUop R3 C Fetch Mar PC MDR M[MAR] IR MDR PC PC + 1 ID / OF A R1 B R2
4 Stages Instruction Fetch ID / OF Exec. Write Back
4 Stages in RTN Exec ALUop A (op) B C ALUop Fetch Mar PC MDR M[MAR] IR MDR PC PC + 1 WB R3 C ID / OF A R1 B R2 *could have conflict with OF and Exec or Exec and WB
A V2 R1 V1 B V3 R2 V2 R3 V1 C R4 R5 R6 R7 R8 R1 R3 V2 V1 4 Stages Example Register Files V3 = V1 + V2 There is a conflict in OF and WB because both are trying to access the register files R3 V3 I. M. IF OF EXEC. WB M OF WB MAR PC MDR
5 Stages Inst. Fetch ID / OF Exec. Mem WB Inst. Memory Data Memory Register File
Load (5 stages) Load Reg, Adrr ID / OF Exec Mem WB IF Inst. Fetch Get values from Registers File Might Need To Offset Gets Value FromData Mem. Write Value back To Reg.
Add (5 stages) ADD R3, R1, R2 IF ID / OF Exec Mem WB Inst. Fetch V1 = R1 V2 = R2 V3 gets V1 + V2 R3 Gets V3
Store (5 stages) STORE Adrr, Reg ID / OF Exec Mem WB IF Inst. Fetch V1 Gets Reg Calculate Effective Address Adrr Gets V1
Bernstein’s Conditions(Review) If input(S1) ∩ output(S2) = Ø∅ (antidependency – WAR) and input(S2) ∩ output(S1) = Ø ∅(data dependency – RAW) and output(S1) ∩ output(S2) = Ø (output dependency – WAW) Then S1 || S2 WAR: Write After Read RAW: Read After Write WAW: Write After Write