1 / 16

CDA 4150 Notes

CDA 4150 Notes. Pipelining. February 18, 2004 By: Nicholas Bray & San Chong Cheng. a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44. b 11 b 12 b 13 b 14 b 21 b 22 b 23 b 24 b 31 b 32 b 33 b 34 b 41 b 42 b 43 b 44. A =. B =.

vilmos
Download Presentation

CDA 4150 Notes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CDA 4150 Notes Pipelining February 18, 2004 By: Nicholas Bray & San Chong Cheng

  2. a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44 b11 b12 b13 b14 b21 b22 b23 b24 b31 b32 b33 b34 b41 b42 b43 b44 A = B = Matrices How do we store A and B in a vector processor so we can access the rows of A and the columns of B?

  3. Solution: M1 M2 M3 M4 Note: this is not the best solution for computing AxB = C. cij = a1j*bi1 + a2j*bi2 + a3j*bi3 + a4j*bi4 So we would need to access akj and bik in parallel. This can be done by switching the rows of memory with the columns (we can access all of Mi in parallel) a11 a21 a31 a41 b11 b12 b13 b14 a12 a22 a32 a42 b21 b22 b23 b24 a13 a23 a33 a43 b31 b32 b33 b34 a14 a24 a34 a44 b41 b42 b43 b44

  4. M1 M2 M3 M4 a11 a24 a33 a42 a12 a21 a34 a43 a13 a22 a31 a44 a14 a23 a32 a41 The matrix can now be accessed in parallel both horizontally and vertically (Review cont.) How should we store the arrays, so that we can access the matrix in parallel (rows&cols of A and B). M1 M2 M3 M4 M5 a11 a34 a43 a12 a21 a44 a13 a22 a31 a14 a23 a32 a41 a24 a33 a42 The matrix can now access diagonally in parallel

  5. Pipelining – Load Store Instruction Fetch Execute

  6. 2 Stages Fetch Mar  PC MDR  M[MAR] Fetch Unit Execute IR  MDR PC  PC + 1 M1 MAR IR Exec A  R1 B  R2 ALU  A (op) B C  ALU (results) R3  C MDR

  7. Further Pipelining Instruction Fetch ID / OF Execute ID / OF: Instruction decode operand fetch

  8. 3 Stages Exec ALUop A (op) B C  ALUop R3 C Fetch Mar  PC MDR  M[MAR] IR  MDR PC  PC + 1 ID / OF A  R1 B  R2

  9. 4 Stages Instruction Fetch ID / OF Exec. Write Back

  10. 4 Stages in RTN Exec ALUop A (op) B C  ALUop Fetch Mar  PC MDR  M[MAR] IR  MDR PC  PC + 1 WB R3 C ID / OF A  R1 B  R2 *could have conflict with OF and Exec or Exec and WB

  11. A V2 R1 V1 B V3 R2 V2 R3 V1 C R4 R5 R6 R7 R8 R1 R3 V2 V1 4 Stages Example Register Files V3 = V1 + V2 There is a conflict in OF and WB because both are trying to access the register files R3 V3 I. M. IF OF EXEC. WB M OF WB MAR PC MDR

  12. 5 Stages Inst. Fetch ID / OF Exec. Mem WB Inst. Memory Data Memory Register File

  13. Load (5 stages) Load Reg, Adrr ID / OF Exec Mem WB IF Inst. Fetch Get values from Registers File Might Need To Offset Gets Value FromData Mem. Write Value back To Reg.

  14. Add (5 stages) ADD R3, R1, R2 IF ID / OF Exec Mem WB Inst. Fetch V1 = R1 V2 = R2 V3 gets V1 + V2 R3 Gets V3

  15. Store (5 stages) STORE Adrr, Reg ID / OF Exec Mem WB IF Inst. Fetch V1 Gets Reg Calculate Effective Address Adrr Gets V1

  16. Bernstein’s Conditions(Review) If input(S1) ∩ output(S2) = Ø∅ (antidependency – WAR) and input(S2) ∩ output(S1) = Ø ∅(data dependency – RAW) and output(S1) ∩ output(S2) = Ø (output dependency – WAW) Then S1 || S2 WAR: Write After Read RAW: Read After Write WAW: Write After Write

More Related