1 / 45

CMPUT229 - Fall 2003

CMPUT229 - Fall 2003. Topic G: IA-64 Highlights José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680. Some Highlights of the EPIC Architecture. Control Speculation Data Speculation Predication Rotating Registers Hardware-Supported Software Pipelining. ld8 r3=[r5]

thyra
Download Presentation

CMPUT229 - Fall 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPUT229 - Fall 2003 Topic G: IA-64 Highlights José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680 CMPUT 680 - Compiler Design and Optimization

  2. Some Highlights of the EPIC Architecture • Control Speculation • Data Speculation • Predication • Rotating Registers • Hardware-Supported Software Pipelining CMPUT 680 - Compiler Design and Optimization

  3. ld8 r3=[r5] br.cond.dptk L1 chks r3=recovery shr r7=r3,r87 After Control Speculation Control Speculation br.cond.dptk L1 ld8 r3=[r5] shr r7=r3,r87 Before Control Speculation CMPUT 680 - Compiler Design and Optimization

  4. Data Speculation An advanced load allows a load to be moved above a store even if it is not known wether the load and the store may reference overlapping memory locations. st8 [r55]=r45 // r55 may or may not contain ld8 r3=[r5] ;; // the same address as r5 shr r7=r3,r87 ld8.a r3=[r5] ;; // Advanced Load // other, unrelated instructions st8 [r55]=r45 ld8.c r3=[r5] ;; shr r7=r3,r87 CMPUT 680 - Compiler Design and Optimization

  5. Speculative Code ld8.a r6 = [r8] ;; // cycle -3 add r5 = r6,r7 // cycle -1; add that uses r6 st8 [r4]=r12 // cycle 0 chk.a r6, recover // cycle 0: check back: // Return point from jump to recover st8 [r18] = r5 // cycle 0 recover: ld8 r6 = [r8] ;; // Reload r6 from [r8] add r5 = r6,r7 // Re-execute the add br back // Jump back to main code Moving Up Loads + Uses: Recovery Code st8 [r4] = r12 // cycle 0: ambiguous store ld8 r6 = [r8] ;; // cycle 0: load to advance add r5 = r6,r7 // cycle 2 st8 [r18] = r5 // cycle 3 Original Code CMPUT 680 - Compiler Design and Optimization

  6. If-conversion If-conversion uses predicates to transform a conditional code into a single control stream code. if(r4) { add r1= r2, r3 ld8 r6=[r5] } cmp.ne p1, p0=r4, 0 ;; // Set predicate reg (p1) add r1=r2, r3 (p1) ld8 r6=[r5] if(r1) r2 = r3 + r3 else r7 = r6 - r5 cmp.ne p1, p2 = r1, 0 ;; // Set predicate reg (p1) add r2 = r3, r4 (p2) sub r7 = r6,r5 CMPUT 680 - Compiler Design and Optimization

  7. In the old days…. MIPS Assembly: # $ao = x[] # $a1 = y[] # $t0 = k addi $t0, $zero, 1 addi $t1, $zero, 5 Loop: sll $t2, $t0, 2 add $t3, $a0, $t2 lw $t4, 0($t3) addi $t4, $t4, 1 add $t5, $a1, $t2 sw $t4, 0($t5) addi $t0, 1 ble $t0, $t1, Loop for(k=1 ; k<=5 ; k++) y[k] = x[k]+1; CMPUT 680 - Compiler Design and Optimization

  8. 34 36 32 33 35 37 38 39 EC LC 4 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 34 36 32 33 35 37 38 39 General Registers (Logical) Predicate Registers Memory 1 0 0 18 16 17 x1 x2 x3 x4 x5 RRB 0 CMPUT 680 - Compiler Design and Optimization

  9. 34 36 32 33 35 37 38 39 EC LC 4 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 34 36 32 33 35 37 38 39 General Registers (Logical) Predicate Registers Memory 1 0 0 18 16 17 x1 x2 x3 x4 x5 RRB 0 CMPUT 680 - Compiler Design and Optimization

  10. 34 36 32 33 35 37 38 39 EC LC 4 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 34 36 32 33 35 37 38 39 General Registers (Logical) Predicate Registers Memory 1 0 0 18 16 17 x1 x2 x3 x4 x5 RRB 0 CMPUT 680 - Compiler Design and Optimization

  11. 34 36 32 33 35 37 38 39 EC LC 4 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 0 0 1 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization

  12. 34 36 32 33 35 37 38 39 EC LC 3 3 1 1 0 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization

  13. 34 36 32 33 35 37 38 39 EC LC 3 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 x2 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 1 0 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization

  14. 34 36 32 33 35 37 38 39 EC LC 3 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y1 x1 x2 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 1 0 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization

  15. 34 36 32 33 35 37 38 39 EC LC 3 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y1 x1 x2 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 1 0 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization

  16. 34 36 32 33 35 37 38 39 EC LC 3 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y1 x1 x2 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 1 0 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization

  17. 34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y1 x1 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 1 18 16 17 x1 x2 x3 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization

  18. 34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 y1 x3 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization

  19. 34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x3 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization

  20. 34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x3 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization

  21. 34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x3 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization

  22. 34 36 32 33 35 37 38 39 EC LC 1 3 1 1 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x3 x2 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization

  23. 34 36 32 33 35 37 38 39 EC LC 1 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 x2 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization

  24. 34 36 32 33 35 37 38 39 EC LC 1 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 y3 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization

  25. 34 36 32 33 35 37 38 39 EC LC 1 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 y3 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization

  26. 34 36 32 33 35 37 38 39 EC LC 1 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 y3 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization

  27. 34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -4 CMPUT 680 - Compiler Design and Optimization

  28. 34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 x3 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -4 CMPUT 680 - Compiler Design and Optimization

  29. 34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -4 CMPUT 680 - Compiler Design and Optimization

  30. 34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -4 CMPUT 680 - Compiler Design and Optimization

  31. 34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -4 CMPUT 680 - Compiler Design and Optimization

  32. 34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 0 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -5 CMPUT 680 - Compiler Design and Optimization

  33. 34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 0 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -5 CMPUT 680 - Compiler Design and Optimization

  34. 34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -5 CMPUT 680 - Compiler Design and Optimization

  35. 34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -5 CMPUT 680 - Compiler Design and Optimization

  36. 34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -5 CMPUT 680 - Compiler Design and Optimization

  37. 34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -5 CMPUT 680 - Compiler Design and Optimization

  38. 34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 0 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 CMPUT 680 - Compiler Design and Optimization

  39. 34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 CMPUT 680 - Compiler Design and Optimization

  40. 34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 CMPUT 680 - Compiler Design and Optimization

  41. 34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 y5 CMPUT 680 - Compiler Design and Optimization

  42. 34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 y5 CMPUT 680 - Compiler Design and Optimization

  43. 34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 y5 CMPUT 680 - Compiler Design and Optimization

  44. 34 36 32 33 35 37 38 39 EC LC 0 0 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 0 0 0 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -7 y5 CMPUT 680 - Compiler Design and Optimization

  45. = 0 (epilog) >1 =0 (prolog/kernel)  0 =1 branch fall-thru The Software Pipelining Branch Instruction LC = Loop Counter EC = Epilog Counter RRB = Rotating Register Base PR = Predicate Register LC? EC? LC-- EC-- EC-- EC PR[16]=1 PR[16]=0 PR[16]=0 PR[16]=0 RRB-- RRB-- RRB-- CMPUT 680 - Compiler Design and Optimization

More Related