450 likes | 621 Views
CMPUT229 - Fall 2003. Topic G: IA-64 Highlights José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680. Some Highlights of the EPIC Architecture. Control Speculation Data Speculation Predication Rotating Registers Hardware-Supported Software Pipelining. ld8 r3=[r5]
E N D
CMPUT229 - Fall 2003 Topic G: IA-64 Highlights José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680 CMPUT 680 - Compiler Design and Optimization
Some Highlights of the EPIC Architecture • Control Speculation • Data Speculation • Predication • Rotating Registers • Hardware-Supported Software Pipelining CMPUT 680 - Compiler Design and Optimization
ld8 r3=[r5] br.cond.dptk L1 chks r3=recovery shr r7=r3,r87 After Control Speculation Control Speculation br.cond.dptk L1 ld8 r3=[r5] shr r7=r3,r87 Before Control Speculation CMPUT 680 - Compiler Design and Optimization
Data Speculation An advanced load allows a load to be moved above a store even if it is not known wether the load and the store may reference overlapping memory locations. st8 [r55]=r45 // r55 may or may not contain ld8 r3=[r5] ;; // the same address as r5 shr r7=r3,r87 ld8.a r3=[r5] ;; // Advanced Load // other, unrelated instructions st8 [r55]=r45 ld8.c r3=[r5] ;; shr r7=r3,r87 CMPUT 680 - Compiler Design and Optimization
Speculative Code ld8.a r6 = [r8] ;; // cycle -3 add r5 = r6,r7 // cycle -1; add that uses r6 st8 [r4]=r12 // cycle 0 chk.a r6, recover // cycle 0: check back: // Return point from jump to recover st8 [r18] = r5 // cycle 0 recover: ld8 r6 = [r8] ;; // Reload r6 from [r8] add r5 = r6,r7 // Re-execute the add br back // Jump back to main code Moving Up Loads + Uses: Recovery Code st8 [r4] = r12 // cycle 0: ambiguous store ld8 r6 = [r8] ;; // cycle 0: load to advance add r5 = r6,r7 // cycle 2 st8 [r18] = r5 // cycle 3 Original Code CMPUT 680 - Compiler Design and Optimization
If-conversion If-conversion uses predicates to transform a conditional code into a single control stream code. if(r4) { add r1= r2, r3 ld8 r6=[r5] } cmp.ne p1, p0=r4, 0 ;; // Set predicate reg (p1) add r1=r2, r3 (p1) ld8 r6=[r5] if(r1) r2 = r3 + r3 else r7 = r6 - r5 cmp.ne p1, p2 = r1, 0 ;; // Set predicate reg (p1) add r2 = r3, r4 (p2) sub r7 = r6,r5 CMPUT 680 - Compiler Design and Optimization
In the old days…. MIPS Assembly: # $ao = x[] # $a1 = y[] # $t0 = k addi $t0, $zero, 1 addi $t1, $zero, 5 Loop: sll $t2, $t0, 2 add $t3, $a0, $t2 lw $t4, 0($t3) addi $t4, $t4, 1 add $t5, $a1, $t2 sw $t4, 0($t5) addi $t0, 1 ble $t0, $t1, Loop for(k=1 ; k<=5 ; k++) y[k] = x[k]+1; CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 4 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 34 36 32 33 35 37 38 39 General Registers (Logical) Predicate Registers Memory 1 0 0 18 16 17 x1 x2 x3 x4 x5 RRB 0 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 4 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 34 36 32 33 35 37 38 39 General Registers (Logical) Predicate Registers Memory 1 0 0 18 16 17 x1 x2 x3 x4 x5 RRB 0 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 4 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 34 36 32 33 35 37 38 39 General Registers (Logical) Predicate Registers Memory 1 0 0 18 16 17 x1 x2 x3 x4 x5 RRB 0 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 4 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 0 0 1 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 3 3 1 1 0 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 3 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 x2 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 1 0 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 3 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y1 x1 x2 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 1 0 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 3 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y1 x1 x2 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 1 0 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 3 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y1 x1 x2 35 37 33 34 36 38 39 32 General Registers (Logical) Predicate Registers Memory 1 1 0 18 16 17 x1 x2 x3 x4 x5 RRB -1 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y1 x1 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 1 18 16 17 x1 x2 x3 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop x1 y1 x3 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x3 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x3 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 2 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x3 x2 36 38 34 35 37 39 32 33 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -2 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 1 3 1 1 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x3 x2 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 1 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 x2 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 1 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 y3 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 1 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 y3 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 1 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 y3 37 39 35 36 38 32 33 34 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -3 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x4 x3 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -4 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16)ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 x3 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -4 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 RRB -4 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -4 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 3 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 1 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -4 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 0 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -5 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 0 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -5 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 x4 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -5 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB -5 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -5 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 2 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 1 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -5 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 0 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 y5 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 y5 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 1 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 38 32 36 37 39 33 34 35 General Registers (Logical) Predicate Registers Memory 0 0 1 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -6 y5 CMPUT 680 - Compiler Design and Optimization
34 36 32 33 35 37 38 39 EC LC 0 0 Software Pipelining Example in the IA-64 General Registers (Physical) loop: (p16) ldl r32 = [r12], 1 (p17) add r34 = 1, r33 (p18) stl [r13] = r35,1 br.ctop loop y2 y1 x5 y5 y4 y3 39 33 37 38 32 34 35 36 General Registers (Logical) Predicate Registers Memory 0 0 0 0 18 16 17 x1 x2 x3 y1 x4 y2 x5 y3 RRB y4 -7 y5 CMPUT 680 - Compiler Design and Optimization
= 0 (epilog) >1 =0 (prolog/kernel) 0 =1 branch fall-thru The Software Pipelining Branch Instruction LC = Loop Counter EC = Epilog Counter RRB = Rotating Register Base PR = Predicate Register LC? EC? LC-- EC-- EC-- EC PR[16]=1 PR[16]=0 PR[16]=0 PR[16]=0 RRB-- RRB-- RRB-- CMPUT 680 - Compiler Design and Optimization