680 likes | 860 Views
EPIC Architectures. Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois at Urbana-Champaign. IMPACT Group http://www.crhc.uiuc.edu/IMPACT/. Outline. History and Background Control Speculation Predication
E N D
EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois at Urbana-Champaign IMPACT Group http://www.crhc.uiuc.edu/IMPACT/
Outline • History and Background • Control Speculation • Predication • IMPACT EPIC Architecture • Compiler Technology • Outlook
x 1 2 3 0 = = = >= + f g 1 0 m enable Vision: Bridging the Gap Between Programs and Hardware if (x>=0) if (x==1 || x==2 || x==3) m=f(x); else m=g(x); x>=0 F T x!=1 F T x!=2 F T x!=3 F T m=g(x) m=f(x)
Hardware highly speculative parallel in nature efficient logic manipulation special purpose area effiicient enery efficient Programming conservative semantics sequential in nature awkward logic manipulation easily retargeted area inefficient energy inefficient Can we get the best of both worlds?
EPIC Design Objectives • To define a programmable architecture model that allows compiled programs to approach special hardware design in • logic manipulation capability • speculation and parallelism • chip area efficiency • energy efficiency
Significant Milestones • 1994 Intel/HP forms IA-64 alliance with U. of Illinois contribution • 1997 Announcement of IA-64 • 1997 Motorola/Lucent forms StarCore alliance with U. of Illinois contribution • 1998 major computer vendors adopt IA-64 • 1998 Announcement of StarCore • 1999 Release of user mode architecture
EPIC - the IMPACT Perspective • IMPACT work done since 1987 to lay foundation for EPIC architectures • Intel/HP IA-64, Motorola/Lucent StarCore • Key Technologies • control speculation [ISCA-91] [ASPLOS-92] [MICRO-96] • data (dependence) speculation [ICS-92] [ASPLOS-94] • predicated execution [MICRO-92][ISCA-95] [MICRO-97] • integrated architecture and inline recovery [ISCA-98] • logic minimization approach to predication [ISCA-99] • implementation neutral predication architecture [TBD]
Outline • History and Background • Control Speculation • Predication • IMPACT EPIC Architecture • Compiler Technology • Outlook
Control Speculation • Executing an instruction before knowing that its execution is required • Moving an instruction above a branch • Removes control dependences to increase ILP • Win when branch directions predicted correctly • Instruction sequence seen by hardware is changed! • Must ensure that execution result unaffected by such movement
A: r6 = r4+1 B: If (r9==0) goto L1 C: r1 = MEM(r2+0) D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 G: MEM(r2+r4) = r4 C: r1 = MEM(r2+0) A: r6 = r4+1 D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 B: if (r9==0) goto L1 G: MEM(r2+4) = r4 Control Speculation Example
Scheduling Error • An ordering of instructions that will • cause early program termination or • produce results that differ from those of the unscheduled program. • To avoid scheduling errors • Live value must be properly preserved - register renaming • Spurious Exception condition must be supressed
Safe Speculation • Compiler analysis to identify • instructions that are always safe. • speculation that will not introduce a new exception. • Trivial analysis examples: • array references with constant indices • divide and remainder with non-zero divisor • Complex analysis examples: • Branches to ensure legal input operands • Earlier use of the same input operand • Loop analysis
Silent Instructions • Architecture provides silent versions of instructions that may potentially cause exceptions. • Multiflow - silent FP instructions • HPPA - silent FP instructions, silent de-referenced null pointer • SPARC V9 - silent load instruction • To move an instr. above a branch, convert it into its silent version. • Both Multiflow TRACE and Cydrome Cydra-5 used similar ideas.
Silent Instructions • Memory access instructions • If a segmentation fault condition occurs, the instruction is canceled before it reaches the memory system. An arbitrary garbage value is returned. • If a page fault happens without segmentation fault, the OS page fault handler is immediately invoked as usual. Extra page faults may occur from speculation. • Arithmetic instructions • If a trap condition occurs, an arbitrary garbage value is deposited into the destination register. • The exception condition is either immediately handled or simply ignored.
Debugging Implications • If the speculated instruction: • the garbage value generated by a silent instruction would not be used. • the exception condition is correctly ignored since the silent instruction should not have been executed. • If the branch agrees with compile-time prediction: • the exception condition that occurred to a silent instruction is incorrectly ignored. • the garbage value generated may be used by a subsequent instruction without warning. • not acceptable if exceptions must be reported timely and accurately
Performance Issues • Page faults caused by silent loads are handled right away • no support to defer page fault until execution of instruction is confirmed. • Additional page may faults result from speculation. • The number additional page faults should be small for systems that are designed not to page. • Similar issues exist if TLB misess are handled through exception mechanism.
Sentinel Scheduling • Design Objective • Correctly ignore exceptions generated by speculative instructions whose execution turns out to be unnecessary. • Correctly report exceptions generated by speculative instructions whose execution is confirmed. • Support recovery from exceptions thus reported. • Provide the option to handle page faults after the need for executing a speculative instruction is confirmed. • Minimize the extra hardware and instructions needed to achieve the objectives above.
Accurate Exception Report • Each instruction has two parts: • Non-excepting part which performs the actual operation • Sentinel part that flags an exception if necessary • Non-excepting part of I can be speculatively executed provided the sentinel part stays in I's home block
A: r6 = r4+1 B: If (r9==0) goto L1 C: r1 = MEM(r2+0) D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 G: MEM(r2+r4) = r4 C: r1 = MEM(r2+0) A: r6 = r4+1 D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 B: if (r9==0) goto L1 sentinels B, C, D, E G: MEM(r2+4) = r4 Sentinel Speculation Example
Sentinel Elimination • The sentinel of I can be eliminated if • there is another instruction in I's home block which uses the result of I OR • I is non-excepting and is not the last direct or indirect use of an excepting instruction's destination • Unprotected instruction - an instruction whose sentinel cannot be eliminated. • If an unprotected instruction is speculated, an explicit instruction must be created to serve as the sentinel
A: r6 = r4+1 B: If (r9==0) goto L1 C: r1 = MEM(r2+0) D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 G: MEM(r2+r4) = r4 C: r1 = MEM(r2+0) A: r6 = r4+1 D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 B: if (r9==0) goto L1 H: check r5 G: MEM(r2+4) = r4 Sentinel Speculation Example
Architectural Support • Additional bit in opcode field to specify speculative instruction. • can be partially supported by adding speculative version of all opcodes that should be considered for speculative scheduling and that can directly or indirectly cause exceptions. • Exception bit (vector) added to each register to mark exceptions caused by a speculative instruction. • These bits need to be preserved across context switches.
Execution Model • Speculative instructions • src(I).except = 0 • I does not cause an exception, normal execution • I causes an exception • dest(I).except = 1 • dest(I).data = pc of I • src(I).except = 1 (exception propagation) • dest(I).except = 1, • dest(I).data = src(I).data
Execution Model • Non-speculative instructions • src(I).except = 0 • I does not cause an exception - normal execution • I causes an exception - I reported as source of exception • src(I).except = 1 • (report exception for speculative instruction) • signal exception • src(I).data is PC of exception
Scheduling Algorithm • Identify unprotected instructions • Perform conventional scheduling • if an unprotected instruction is moved above a branch, an explicit sentinel instruction is inserted into list of to-be-scheduled instructions • Explicit sentinel restricted to remain in I's home block with control dependences • All instructions moved above a branch are marked as speculative
Recovery from Exception • Important to allow accurate handling of page faults and TLB misses. • Issues: • ensure that instructions can be retried after the exception condition is handled • minimize the negative performance impact in terms of register pressure and instruction count due to recovery.
Recovery Block • Copy speculative instructions into recovery blocks • One entrance point per potential exception reported by a sentinel • Code Expansion vs. Efficiency • must provide a means to reach recovery block - explicit checks • Source registers of the instructions not in the recovery blocks are not preserved. • Instructions re-executed during recovery are reduced.
A: r6 = r4+1 B: If (r9==0) goto L1 C: r1 = MEM(r2+0) D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 G: MEM(r2+r4) = r4 C: r1 = MEM(r2+0) A: r6 = r4+1 D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 B: if (r9==0) goto L1 H: check r5, L2 I: check r4, L3 G: MEM(r2+4) = r4 Recovery Block Example
Recovery Block for C L2: C: r1 = MEM (r2+0) E: r5 = r1 + 1 Recovery Block for D L3: D: r3 = MEM (r2+r4) F: r4 = r3 + 1 G: MEM (r2+4) = r4 Recovery Block Example
Multiple Exceptions • Different basic blocks • first sequential exception always reported since check instruction guaranteed to remain in home block of each potential trap-causing instruction • Same basic block • An exception will be signaled but no guarantee it will be the first according to original source code
Outline • History and Background • Control Speculation • Predication • IMPACT EPIC Architecture • Compiler Technology • Outlook
Predicated Execution • Conditional execution of instructions based on a Boolean source operand • Execution model • Load r1, r2, r3 <p1> • If p1 is TRUE, instruction executes normally • If p1 is FALSE, instruction treated as NOP (with some exceptions)
Full Predication Support • Predicate defining instructions • Full set of predicated instructions • Separate predicate register file • Best performance • Cydra-5, IA-64, TI-C60, StarCore
Partial Predication Support • Adds limited set of predicated instructions to existing ISA • no extension to operand format • CMOV • Brings some performance increase to existing ISA’s • SPARC, Alpha, MIPS, P6
HP-PD Predicate Defines pred< cmp > dest < type >, src1, src2 (Pin) • < cmp > - condition: =, >, <, etc. • < type > • Unconditional (U, U) • OR-type (O, O) • AND-type (A, A)
bge a, 10, L1 F T add c, c, 1 jmp L3 ble b, 20, L2 F T add d, d, 1 jmp L3 add e, e, 1 L3 Unconditional Predicate Defines • For blocks reached on one condition If (a < 10) c= c+1; else if (b > 20) d = d+1; else e = e+1;
Pout Unconditional Predicate Define pred p1(U), p2(U), a 10 add c, c, 1 (p2) pred p3(U), p3(U), b 20 (p1) add d, d, 1 (p4) add e, e, 1 (p3) bge a, 10, L1 T F ble b, 20, L2 add c, c, 1 jmp L3 F T add e, e, 1 add d, d, 1 jmp L3 L3
Or Predicate Defines • For blocks reached on multiple conditions beq a, 0, L1 If (a && b) c= c+1; else d = d+1; T F beq b, 0, L1 T F add d, d, 1 jmp L2 L1: add e, e, 1 L2:
Or-type Predicate Define pred_clr p1 pred p1(O), p2(U), a = 0 pred p1(O), p3(U), b = 0 (p2) add d, d, 1 (p3) add e, e, 1 (p1) bge a, 0, L1 F T ble b, 0, L1 T Pout F add d, d, 1 jmp L2 L1: add e, e, 1 L2:
And-type Predicate Define pred_clr p1 pred_set p3 pred p1(O), p3(A), a = 0 pred p1(O), p3(A), b = 0 add d, d, 1 (p3) add e, e, 1 (p1) bge a, 0, L1 F T ble b, 0, L1 T Pout F add d, d, 1 jmp L2 L1: add e, e, 1 L2:
Outline • History and Background • Control Speculation • Predication • IMPACT EPIC Architecture • Compiler Technology • Outlook
IMPACT EPIC Architecture • Predication • base model is HP-PD [Schlansker,Rau, Kathail] • added implicit predicate pR to facilitate speculation • prefix alternative for code size control [EuroPar-99] • added new conjunctive and disjunctive types to facilitate minimization of program decision logic • moving towards implementation-neutral predication • Control Speculation • based on Sentinel model [ASPLOS-92] • added R-Tags (in addition to E-tags) and pR (implicit recovery predicate) to enable inline recovery
T/F T/F E-Tag E-Tag R-Tag R-Tag IMPACT EPIC Architecture Register File Instructions Value/PC E-Tag R-Tag S DS LOAD Pred DS CHECK Pred Memory Conflict Buffer Register Tag and Attribute S OPERATION Pred Predicate Register File pR
Control Speculative Execution • Speculative instruction causes an exception • write current PC into destination register • set E-Tag in destination register • Speculative instruction propagates an exception • a source register with set E-Tag • Propagate PC from source to destination register • set E-Tag in destination register • Non-speculative instruction detects exceptions • a source register with set E-Tag
Integrated Predication and Control Speculation • All of the following must be true for a predicated instruction to take effect • input predicate true • input predicate E-Tag false • either • pR false, or • R-Tag of at least one input registers true
Speculation Example • Speculative (affected by exception) • speculative (not affected) • Non-speculative • branch • check (non-speculative use)
Inline Recovery Model • Processor enters recovery mode, set pR • PC in source register used as recovery PC • The speculative instruction at recovery PC is executed non-speculatively. • Exception processing is performed. • If exception is non-terminating, the result is stored into destination register, set R-Tag. • Instructions with R-Tag set in source registers are executed, set R-Tag in destination register