450 likes | 647 Views
Static ILP Static (Compiler Based) Scheduling. Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα. Today’s Theme and Contents. Let compiler uncover the ILP Objective:more ilp/simpler hardware/faster clock/less power How: Static Scheduling Loop Unrolling
E N D
Static ILP Static (Compiler Based) Scheduling Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα
Today’s Theme and Contents • Let compiler uncover the ILP • Objective:more ilp/simpler hardware/faster clock/less power • How: • Static Scheduling • Loop Unrolling • software pipelining, • Static Multiple Issue: VLIW • local, global scheduling • static branch prediction • software speculation: trace scheduling, superblocks • nops, lockstep • conditional moves,predication • speculative loads • IA-64 and Itanium
Basic Idea • The compiler moves dependent instructions apart to avoid hazards • This means: • such instructions exist (if not there employ transformations) • the compiler knows implementation details • latency AND superscalarity (issue width) • What happens if implementation changes? • Static ILP applicable to statically and dynamically scheduled processors • Statically scheduled processors: the compiler dictates which instructions can execute together (scheduling done in software)
HOW?? Static prediction, profile, frequency, path Which is better the above or dynamic prediction
Superblocking: overcomes some of the complexities of trace scheduling single vs multiple entry
Predicated Execution &Conditional Moves Convert control dependences to data dependences if (a=0) s=t; R1 R2 R3 bnez R1,L addu R2,R3,0 L: cmovz R2,R3,R1 Above for all itypes is called predication… +/-?
Speculative Loads Bypass stores speculative - repair code in case of mispeculation Use an address buffer 1. LookUp Table: updated by address of speculative load 2. Updated by addresses of intervening stores 3. Check instruction that no store conflicted and release entry
Let the compiler do the work • All • Most of it • As long as it improves performance • …
Itanium Processor Microarchitecture by Harsh Sharangpani and Ken Arora see web page
EPIC Conceptual View Idea Compiler has larger instruction window than hardware. Communicate to the hardware more of the information gleaned at compile time.
Hardware Pipeline Six instructions wide and ten stage deep Tries to minimize latency of most frequent operations Hardware support for compilation time indeterminacies
Front End • Software initiated prefetch (requests filtered by instruction cache) • prefetch must be 12 cycles before branch to hide latency • L2 -> streaming buffer -> instruction cache • Four level branch predictor hierarchy to prevent 9-cycle pipeline stall Decoupling buffer hold up to 8 bundles of code (bundle?)
Conclusion/Future • Compiler can do a lot of the work but need hardware assitance • Currently in pursue of best of both worlds • Future: • How long IA-32 will last --- and will IA-64 take over IA32 market? • Will IA64 be the only ISA in the world?