350 likes | 486 Views
Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling. John Cavazos (Ben Perry) University of Delaware. Overview. Introduction Pipelining Instruction Pipeline Pipeline Execution Constraints and Dependences. Current Processors.
E N D
Optimizing CompilersCISC 673Spring 2011Gobal Instruction Scheduling John Cavazos (Ben Perry) University of Delaware
Overview • Introduction • Pipelining • Instruction Pipeline • Pipeline Execution • Constraints and Dependences
Current Processors • Can execute several operations in a single cycle • “How fast can a program run on a processor with instruction-level parallelism?” • Potential parallelism in the program • Available parallelism on the processor • Ability to parallelize a sequential program • Find best schedule given constraints
Best targets • Programs with operations that are completely dependent on each other are no good • Focus on constraints instead of scheduling • Numeric applications with large aggregate data structures are good.
Pipelines • Instruction Pipelines are found in every processor • Instructions go through multiple steps in the pipeline from read to execute • Fetch, decode, execute, access memory, write result • Parallel processors: new instruction can be fetched while current instruction is processed. • Each step in the pipeline takes a clock cycle
Pipelines – Speculative Computing • Load next instruction even if it may be branched over (speculative) • On a branch event, the pipeline is emptied and the branch must be fetched. (delay) • Hardware can predict which branch to fetch, but it may be wrong
Pipeline Execution • Execution of an instruction is pipelined if succeeding instructions not dependent on the result are allowed to proceed. • Hardware can often detect dependencies (superscaler machines) and pause execution if operand isn’t available
Pipeline Execution • Some processors (Android phone, perhaps), leave batch execution to compilers. • Very-long-instruction-words (VLIW) are created by compiler that indicate a batch of instructions to execute in parallel. • Out-of-order instructions can be scheduled by advanced schedulers; best done at software due to hardware limitations
Code-scheduling Constraints • Control-dependence – All operations executed in original must be executed • Data-dependence – Must produce same results as original • Resource
Data dependence • X = 5; Y = 6 • Obviously, we can reorder these operations. • X = 5; Y = X • Obviously, we cannot reorder these.
Data dependence • RAW – Read after write. True dependence. • If a write is followed by a read of the same location, the read depends on the value written • WAR – Write after Read. Anti-dependence • If the write happens before the read, the read will get the wrong value.
Dependence • WAW – Write after Write. • If two writes go to the same location, the value will be wrong • WAR and WAW can be eliminated using different locations to store different values.
Finding dependences • Compiler: GUILTY until proven innocent! (always assume operations refer to same location, and prove it otherwise). • Pointers p and (p + 10) cannot possibly refer to the same location • Array data dependence analysis: • for i=0 to n: a[2i] = a[2i + 1]. • No dependency in array during this loop
Finding dependences • Pointer alias analysis • Two pointers are aliased if they refer to the same object. Difficult problem. • Interprocedural Analysis • Parameters passed by reference, or if globals are passed
Register allocation • LD temporary_register1, aST b, temporary_register1LD temporary_register2, cST d, temporary_register2 • Two RAWs, but can be reordered. • If temporary_registers 1 and 2 get mapped to the same physical register, we create another dependency
Control dependence • All operations in a basic block are guaranteed to execute. • But they’re small • And often highly related. • Optimize across other basic blocks is crucial.
Control dependence • An instruction i1 is control dependent on instruction i2 if the outcome of i2 determines whether i1 is to be executed • Speculatively execute across different basic-blocks
Speculative computing • Prefectching • Bring data from memory to the cache before it is needed • Poison bits • Don’t throw exceptions when speculatively computing. Instead, set poison bit. If poison registered is really used, then throw exception.
Speculative computing • Predicated Execution • Change if (a == 0) b = c • Tost r4, r3movif r2, r4, r1 • Processor supports a conditional store, enabling combination of basic blocks
Basic Block List Scheduling • NP-complete, but don’t give up. • Basic blocks are typically small. • Start with data-dependence graph • Nodes are instructions and resource annotations • Edges are data dependences with a delay destination has to wait (some instructions may take 10 cycles, others only 1).
List Scheduling • Data dependence cannot have cycles • Build a topological ordering of the nodes • several such orderings may exist, though some are better than others • Choose an ordering of the nodes such that for each node, any following node cannot create a dependence on it.
List Scheduling RT = an empty reservation table Foreach n in SortedNodes: -Find the earliest time instruction could begin -Delay the instruction until resources are available -Schedule node after all delays -claim resources
List Scheduling – better topologies • Longest path through the data-dependence graph is shortest schedule. • Resources available constrain; critical resource is the one with the largest ratio of uses to the number of units of that resource available.
Global Code Scheduling • Optimize use of resources across blocks. • Global Code Scheduling - Moving instructions from one basic block to another • Data AND control dependencies. • All instructions still must be performed • Speculative computing cannot be disruptive.
Global Code Scheduling example • if (!a) {c=b;}e=d+d • What are the data dependences? • What are the control dependences? • What can intuitively be ran in parallel?
Global Code Scheduling Example • if (!a) {c=b;}e=d+d • Loads take two clock ticks, always hit. R1 = a, R2 = b, …, • Processor can execute two instructions
Code movement • Definitions: • Dominates – A dominates B if all paths through B pass through A. • Post-dominates – B post-dominates A if all paths that pass through A pass through B. • Downward – Move operation down along control • Upward – Move operation up along control
Upward Code Movement • Moving instruction from block src to block dest. Block src comes after block dest in the topological-sorted graph. Assume no dependencies. • If dest dominates src and src post-dominates dest, then we’re done.
Upward Code Movement • If src does not postdominate dst, then we have to speculatively compute • Only desirable if the operation is cheap • Only useful if src is reached. • If dst does not dominate src, copies of the instruction are needed
Downward Code Movement • Moving instruction from block src to block dest. Block src comes before block dest in the topological-sorted graph. Assume no dependencies • If src dominates dest and dest dominates src, we’re done.
Downward Code Movement • If src does not dominate dest, • Writes are often overwritten • Extra operations will be needed. • Replicate basic blocks and place operation in new copy of dest • Alternatively, use predicated instructions • (speculative) • If dest does not post-dominate src, • Compensation code
Conclusion • Processors can execute several instructions in parallel • We take advantage of this by moving code • Code can be moved if no dependencies occur, but sometimes at a cost.