300 likes | 440 Views
IMPACT Second Generation EPIC Architecture. Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois at Urbana-Champaign. IMPACT Compiler Group http://www.crhc.uiuc.edu/IMPACT/. x. 1. 2. 3. 0. =. =. =. >=. +. f. g. 1. 0.
E N D
IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois at Urbana-Champaign IMPACT Compiler Group http://www.crhc.uiuc.edu/IMPACT/
x 1 2 3 0 = = = >= + f g 1 0 m enable Vision: Bridging the Gap Between Programs and Hardware if (x>=0) if (x==1 || x==2 || x==3) m=f(x); else m=g(x); x>=0 F T x!=1 F T x!=2 F T x!=3 F T m=g(x) m=f(x)
Hardware highly speculative parallel in nature efficient logic manipulation special purpose area efficient energy efficient Programming conservative semantics sequential in nature awkward logic manipulation easily retargeted area inefficient energy inefficient Can we get the best of both worlds?
EPIC Design Objectives • To define a programmable architecture model that allows compiled programs to approach special hardware design in • logic manipulation capability • speculation and parallelism • chip area efficiency • energy efficiency
EPIC - the IMPACT Perspective • IMPACT work done since 1987 to lay foundation for EPIC architectures • Intel/HP IA-64, Motorola/Lucent StarCore • Key Technologies • control speculation [ISCA-91] [ASPLOS-92] [MICRO-96] • data (dependence) speculation [ICS-92] [ASPLOS-94] • predicated execution [MICRO-92][ISCA-95] [MICRO-97] • integrated architecture and inline recovery [ISCA-98] • logic minimization approach to predication [ISCA-99] • implementation neutral predication architecture [TBD]
Focus of this Talk • Inline recovery of speculative exceptions in the IMPACT second generation EPIC architecture • minimal explicit checks • no recovery blocks • minimal additional hardware requirement • Other aspects NOT covered: new predication architecture, new integrated architecture for speculation and predication
Control Speculation • Executing an instruction before knowing that its execution is required • Moving an instruction above a branch or a predicate defining instruction • Removes control dependences to increase ILP • Code seen by hardware is changed! • Must ensure that execution result is unaffected by such movement
A: r6 = r4+1 B: If (r9==0) goto L1 C: r1 = MEM(r2+0) D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 G: MEM(r2+r4) = r4 C: (s) r1 = MEM(r2+0) D: (s) r3 = MEM(r2+4) E: (s) r4 = r3+1 F: (s) r5 = r1+1 A: r6 = r6+1 B: if (r9==0) goto L1 G: MEM(r2+4) = r4 Control Speculation Example
C and D speculative instructions that can potentially cause exceptions and events program exceptions bus errors, segmentation faults, divide by 0, … transparent events page faults, TLB misses, cache misses C: (s) r1 = MEM(r2+0) D: (s) r3 = MEM(r2+4) E: (s) r4 = r3+1 F: (s) r5 = r1+1 A: r6 = r6+1 B: if (r9==0) goto L1 G: MEM(r2+4) = r4 Speculative Exceptions and Events
Spurious Data Cache Misses and Exceptions • Spurious cache misses, TLB misses, and page faults are frequent in speculated code. Failing to suppress them can have a detrimental effect on performance.
Sentinel Speculation • Design Objective • Ignore exceptions and events caused by speculative instructions whose execution proves to be unnecessary. • Support accurate reporting and recovery from exceptions generated by speculative instructions whose execution is confirmed. • Provide the option to handle speculative transparent events after the need for handling them is confirmed. • Minimize the extra hardware and instructions incurred
Sentinels and Protected Instructions • A speculated excepting instruction I is protected if there is a non-speculative instruction in I's home block which directly or indirectly uses the result of I • The non-speculative use instruction is defined as the sentinel for I • All speculative excepting instructions must be protected to guarantee exception recovery and delayed event handling
G is the sentinel for D G indirectly used D’s result through E H is the sentinel for C H indirectly uses C’s result through F H is explicitly created to make C protected C: (s) r1 = MEM(r2+0) D: (s) r3 = MEM(r2+4) E: (s) r4 = r3+1 F: (s) r5 = r1+1 A: r6 = r6+1 B: if (r9==0) goto L1 H: use r5 G: MEM(r2+4) = r4 Sentinel Speculation Example
T/F T/F E-Tag E-Tag R-Tag R-Tag IMPACT EPIC Architecture Register File Instructions Value/PC E-Tag R-Tag S DS LOAD Pred DS CHECK Pred Memory Conflict Buffer Register Tag and Attribute S OPERATION Pred Predicate Register File pR
Speculation Example • C: (s) r1 = MEM(r2+0) • D: (s) r3 = MEM(r2+4) • E: (s) r4 = r3+1 • F: (s) r5 = r1+1 • A: r6 = r6+1 • B: if (r9==0) goto L1 • H: check r5 • G: MEM(r2+4) = r4 • Speculative (affected by exception) • speculative (not affected) • Non-speculative (not affected) • branch • check (non-speculative use) that detects exception
Speculative Instruction Execution • If src(I).E-Tag = 0 • I does not cause an exception, normal execution • I causes an exception • dest(I).E-Tag = 1 • dest(I).data = pc of I • if src(I).E-Tag = 1 (exception propagation) • dest(I).E-Tag = 1, • dest(I).data = src(I).data
Speculation Example • C: (s) r1 = MEM(r2+0) • D: (s) r3 = MEM(r2+4) • E: (s) r4 = r3+1 • F: (s) r5 = r1+1 • A: r6 = r6+1 • B: if (r9==0) goto L1 • H: check r5 • G: MEM(r2+4) = r4 • D causes a bus error • r3.E-Tag = 1 • r3.data = pc of D • E propagates exception • r4.E-Tag = 1 • r4.data = r3.data = pc of D
Non-Speculative Instruction Execution • If src(I).E-Tag = 0 • I does not cause an exception - normal execution • I causes an exception - I reported as source of exception • If src(I).E-Tag = 1 • (report exception for speculative instruction) • processor enters recovery mode • src(I).data is PC of exception
Speculation Example • C: (s) r1 = MEM(r2+0) • D: (s) r3 = MEM(r2+4) • E: (s) r4 = r3+1 • F: (s) r5 = r1+1 • A: r6 = r6+1 • B: if (r9==0) goto L1 • H: check r5 • G: MEM(r2+4) = r4 • H executes, not affected • r5.E-Tag == 0 • r5.data = result of F • G initiates recovery • r4.E-Tag == 1 • r4.data ==pc of D
Inline Recovery Model • Processor enters recovery mode, set pR • PC in source register used as recovery PC • The speculative instruction at recovery PC is executed non-speculatively. • Exception processing is performed. • If exception is non-terminating, the result is stored into destination register, set R-Tag. • Instructions with R-Tag set in source registers are executed, set R-Tag in destination register
Inline Recovery Example • C: (s) r1 = MEM(r2+0) • D: (s) r3 = MEM(r2+4) • E: (s) r4 = r3+1 • F: (s) r5 = r1+1 • A: r6 = r6+1 • B: if (r9==0) goto L1 • H: check r5 • G: MEM(r2+4) = r4 • G triggers recovery with pc of D • D re-executed non-speculatively • r3.R-Tag = 1 • r3.data = result of D • E re-executed • r4.R-Tag = 1 • r4.data = result of E • F, A skipped over
Inline Recovery Model (cont.) • Non-speculative instructions not repeated. • Stores, self-incrementing loads and stores, etc. are safe. • Same effect is achieved by recovery blocks. • Source registers of non-speculative instructions do not need to be preserved.
Inline Recovery Model (cont.) • Branches and predicate defines repeated • to reproduce original control flow • input condition must be preserved • not to be speculated • Recovery mode is turned off when reaching check with set source R-Tag.
Inline Recovery Example • C: (s) r1 = MEM(r2+0) • D: (s) r3 = MEM(r2+4) • E: (s) r4 = r3+1 • F: (s) r5 = r1+1 • A: r6 = r6+1 • B: if (r9==0) goto L1 • H: check r5 • G: MEM(r2+4) = r4 • B repeated • H skipped over • G re-executed, turns off recovery mode, R-Tag for all registers = 0
Why both E-Tag and R-Tag • C: (s) r1 = MEM(r2+0) • D: (s) r3 = MEM(r2+4) • J: (s) r4 = MEM(r3+0) • F: (s) r5 = r1+1 • A: r6 = r6+1 • B: if (r9==0) goto L1 • H: check r5 • G: MEM(r2+4) = r4 • Additional exceptions may occur to speculative instructions during recovery • R-Tag designates new value generated in recovery mode, execute • E-tag designates exceptions, propagate pc in source register
Implementation Considerations • R-Tag, E-Tag, PC value can all be confined in the retirement stage of the processor • full register file manipulated at the retirement stage • no need to introduce PC values into main data path • increased detection latency • no impact on size and clock rate of the main data path • Prediction scheme needed for selective handling of transparent events. Traditional branch prediction schemes could be used but lower cost approaches exists [ISCA-99].
Conclusion • Inline recovery sets foundation for selective handling of exceptions and transparent events caused by speculative instructions • Additional architectural cost: one bit per register file entry