1 / 37

Compiler Research in HPC Lab

Compiler Research in HPC Lab. R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in. Organization. HPC Lab Research Overview Compiler Analysis & Optimizations Precise Dataflow Analysis Energy Reduction for Embedded Systems

tambre
Download Presentation

Compiler Research in HPC Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in

  2. Organization • HPC Lab Research Overview • Compiler Analysis & Optimizations • Precise Dataflow Analysis • Energy Reduction for Embedded Systems • Array Allocation for Partitioned Memory Arch. • Dynamic Voltage Scaling • Integrated Spill Code Generation & Scheduling • Conclusions

  3. HPC Team (or HPC– XI) • Mruggesh Gajjar • B.C. Girish • R. Karthikeyan • R. Manikantan • Santosh Nagarakatte Coach: R. Govindarajan • Rupesh Nasre • Sreepathi Pai • Kaushik Rajan • T.S. Rajesh Kumar • V.Santhosh Kumar • Aditya Thakur

  4. HPC Lab Research Overview • Compiler Optimizations • Traditional analysis & optimizations, power-aware compiling techniques, compilation techniques for embedded systems • Computer Architecture • Superscalar architecture, architecture-compiler interaction, application-specific processors, embedded systems • High Performance Computing • Cluster computing, HPC Applications

  5. Compiler Research in HPC Lab. • ILP Compilation Techniques • Compiling Techniques for Embedded Systems • Compiling Techniques for Application-Specific Systems • Dataflow Analysis

  6. ILP Compilation Techniques • Instruction Scheduling • Software pipelining • Register Allocation • Power/Energy Aware Compilation techniques • Compiling Techniques for embedded systems/application specific processors (DSP, Network Processors, …)

  7. Compiling Techniques for Embedded Systems • Power-aware software pipelining method (using integer linear program formulation) • Simple Offset Assignment for code-size reduction. • Loop transformation and memory bank assignment for power reduction. • Compiler Assisted Dynamic Voltage Scaling • Memory layout problem for embedded systems • MMX code generation using vectorization

  8. Compiling Techniques for Application Specific Systems • Framework for exploring application design space for network application • Compiling techniques for Streaming Applications and Program Models • Buffer-Aware, Schedule-size Aware, Throughput Optimal Schedules

  9. Compiler Analysis • Precise Dataflow Analysis • Pointer Analysis

  10. So, What is the Connection? • Compiler problems are • Optimization problems – solved by formulating the problem as Integer Linear Program problem. • Involves non-trivial effort! • Efficient formulation for reducing exec. time! • Other evolutionary approaches can also used. • Graph Theoretic problems – leverage existing well-known approaches • Modelled using Automaton – elegant problem formulation to ensure correctness

  11. Precise Dataflow Analysis • The Problem: Improve precision of data-flow analysis used in compiler optimization

  12. Constant Propagation {x = 1} {x = 2} {x = nc} Can’t replace the use of x at G with a constant. …: statements unrelated to x or y nc : not constant{ } : Data-flow information

  13. Overview of our Solution Can replace uses of x at G1 and G2

  14. Challenges • The Problem: Improve precision of data-flow analysis • Approach: Restructuring control-flow of the program • Challenges: • Developed generic framework • Guarantees optimization opportunities • Handles the precision and code size trade-off • Approach is simple and clean

  15. A brief look at our example. {x = 1} {x = 2} At control-flow merge D, we lose precision. {x = nc} …: statements unrelated to x or y nc : not constant{ } : Data-flow information

  16. Need to duplicate this in order to optimize node G… …: statements unrelated to x or y nc : not constant{ } : Data-flow information

  17. …such that paths with differing dataflow information do not intersect. …: statements unrelated to x or y nc : not constant{ } : Data-flow information

  18. No need to duplicate this. …: statements unrelated to x or y nc : not constant{ } : Data-flow information

  19. Control-flow Graph = Automaton • View a control-flow graph G as a finite automaton with • states as nodes • start state as entry node • accepting state as exit node • transitions as the edges

  20. The Automaton 0 G-HG-I G-HG-I C-D B-D 1 2 B-D C-D Split Automaton for D

  21. The Automaton 0 G-HG-I G-HG-I C-D B-D 1 2 B-D C-D Split Automaton for D

  22. 0 G-HG-I G-HG-I C-D B-D 1 2 B-D C-D CFG x Automaton = Split Graph Split Automaton for D more

  23. Energy Reduction: Array Alloc. for Partitioned Memory Arch. • Dynamic Energy reduction in Memory Subsystem. • Memory subsystem consumes significant energy • Many embedded applications are array intensive • Memory architecture with multiple banks • Exploiting various low-power modes of partitioned memory architectures. • Put idle memory banks in low-power mode • Allocate arrays to memory banks s.t. more memory banks can be in low-power mode for longer duration

  24. Partitioned Memory Architectures • Memory banks with low-power modes. • Active, Stand-by, Napping, Power-down, Disabled. • Resynchronization time – time to move from lower power mode to Active mode

  25. a N 2N d c 4N N 8N b Motivating Example Array Relation Graph Example : float a[N], d[N]; double b[N], c[N]; L1: for (ia=0;ia < N;ia++) d[ia] = a[ia] + k; L2: for (ia=0;ia < N;ia++) a[ia] = b[ia] * k ; L3: for (ia=0;ia < N;ia++) c[ia] = d[ia] / k; L4: for (ia=0;ia < N;ia++) b[ia] = c[ia] - k; L5: for (ia=0;ia < N;ia++) b[ia] = d[ia] + k; Arrays a, d ~ 1 MB each Arrays b, c ~ 2 MB each Memory bank size = 4MB Memory banks active for a total of 32N cycles!

  26. a N 2N d c 4N N 8N b Motivating Example -- Our Approach • Array allocation requires partitioning the ARG! • Graph partitioning such that each subgraph can be accommodated in a memory bank. • Weights of edges across subgraphs is the cost of keeping multiple banks active together. Minimize them! • Arrays b and c in one subgraph and a and d in another Array Relation Graph Memory banks active for a total of 23N cycles!

  27. Dynamic Voltage Scaling • Dynamically vary the CPU frequency and supply voltage. • Dynamic Power proportional to C * V2 * f • C capacitance • V supply voltage • f operating frequency • Processors support different Voltage (and Frequency) modes and can switch betn. them. • AMD, Transmeta, Xscale provide support for DVS, have multiple operating frequencies.

  28. Compiler Assisted DVS • Identify program regions where DVS can be performed. • For each program region, identify the voltage (freq.) mode to operate on, s.t. energy is minimized • Ensure that performance is not degraded.

  29. 2 % Increase 30 % decrease Motivating Example

  30. DVS Problem Formulation • Program divided into number of regions. • Assign an operating frequency for each program region. • Constraint • Marginal increase in exec. time of the program. • Objective • Minimizing program Energy Consumption. • Multiple Choice Knapsack Problem

  31. Compiler Problem as Optimization Problem • Integrated register allocation, spill code generation and scheduling in Software Pipelined loop • Problem: Given Machine M, Loop L, a software pipelined schedule S with initiation interval II, perform Register Allocation and generate spill code, if necessary, and schedule them such that the register requirement of the schedule  Number of Registers and resource constraints are met!

  32. A .................... Register Rn Register R0 def 0 .................... 1 1 .................... 2 2 3 3 .................... use 4 4 .................... 5 5 .................... 6 6 use 7 7 Live Range Representation Modeling Liverange

  33. Register Rn A .................... Register R0 def 0 store .................... 1 1 store .................... 2 2 store 3 3 store use .................... 4 4 5 5 6 6 use use 7 7 Latencies: Load : 1, Store : 1, Instruction : 1 Store decision variables Modeling Spill Stores

  34. Register Rn A .................... Register R0 def 0 1 1 2 2 load .................... 3 3 load use 4 4 load .................... 5 5 load .................... 6 6 use use 7 7 Latencies: Load : 1, Store : 1, Instruction : 1 Load decision variables Modeling Spill Loads

  35. Constraints- Overview Constraints • Every live range must be in a register at the definition time and the use time. • Spill load can take place only if the spill store has already taken place. • After a spill store, a live range can continue or cease to exist. • Ensure that the spill loads and stores don't saturate the memory units. • Minimize the number of spill loads and stores.

  36. Objective • No Objective function – just a constrain solving problem! • Minimize the number of spill loads and stores  STN i,r,t+LTN i,r,t

  37. Conclusions • Compiler research is fun! • It is cool to do compiler research! • But, remember Proebsting’s Law: Compiler Technology Doubles CPU Power Every 18 YEARS!! • Plenty of opportunities in compiler research! • However, NO VACANCY in HPC lab this year! 

More Related