Optimizing Compiler . Scalar optimizations .

Optimizing Compiler. Scalar optimizations.

Main characteristics of the application, affecting its performance Calculations efficiency, Memory usage effectiveness, Correct branch prediction, Efficient use of vector instructions, The effectiveness of parallelization, Instructional parallelism level.

Optimizing compiler role Compiler translates the entire source program into an equivalent program in the resulting machine code or assembly language. • The main objective of optimizing compiler is obtaining effective code for target computer system. • From a developer point of view, the program must be: • easily readable and modifiable • easy to debug • quickly performed • A developer needs • reliable unified development environment • ability to vary the levels of debugging and performance • possibility to obtain high-performance code for different operating systems and microprocessor architectures.

An optimizing compiler is complex software system, driven by the requirements tothe resulting code. Compiler developers face: the complexity of the optimizations legality proof, calculations of profitability, lack of compile-time representation of a typical input data, etc. It requires close cooperation with the developer for achieving the best results. To use features of the compiler successfully, the programmer must: have ideas about computer systems which will be used by his applications; have knowledge about compiler command line options; learn the basic techniques of performance improvements which are used by the compiler; be familiar with the main problems causing the application slowdown; have ideas about the input data which the application will use; know how to analyze program performance.

Intel compilers Intel provides C/C++ and Fortran compilers for Windows, Linux and Mac OS operating systems. For Windows INTEL compiler is made as plug-in for the Microsoft Visual Studio. The important purposes of the Intel compilers are well-timed support of all new computer systems, compatibility with Microsoft Visual Studio on a Windows platform and gcc on Linux and Mac OS, supplying convenient environment to develop effective applications. www.intel.com/software/products

Source files FE (C++/C orFortran) Two pass and single pass compilation scheme Internal representation Profiler Temporary files or object files with IR Scalar optimizations Loop optimizations Interprocedural optimizations -Qipo/-Qip Code generation Scalar optimizations Object files Code generation Loop optimizations Executable file of library

Front End • Parsing is the process of input characters analysis, usually in accordance with a given formal grammar. • During parsing the source code is converted into a data structure. Usually it is a tree that reflects the syntax structure of the input sequence and is well suited for further processing. • Typically, parsing is divided into two levels: • lexical analysis - the input stream of characters partitioned into a linear sequence of tokens - "words" of language (eg, integers, identifiers, string constants, etc.); • semantic analysis - token are converted into statements and expressions of used language, according to grammatical rules. • At the output we get FE related tables, which are called the internal representation of the program. The usual practice is to share one internal representation for the various high-level languages.

Internal representation List of statements is base structure of internal representation. • Statements may be regarded as the smallest independent elements of the programming language.Statements are used to describe assignments, flow control commands (such as IF, GOTO, CALL, RETURN), the function calls, etc.

The statements are usually presented in a list and can be linked in two ways:1.) Lexically. Each statement has a predecessor and a successor.2.) By control flow graph. struct Stmt { common_members: int type; Stmt * pred; Stmt *succ; Basic_Blockbblock; … } Some simple scalar optimizations based on walking through the list of statements to find some specific statements and process them: For_All_Subroutine_Stmt(subroutine,stmt) { if(Stmt_type(stmt) == Stmt_Assign { //assignment processing } }

Stmt_AssignN Expr_Var lval rval ‘a’ Expr_Add Expr_Var Expr_Var ‘b’ ‘c’ Expressions a = b + c; expressions represent expression tree. Boundary expressions can be variables or constants Internal representation also contains a lot of tables describing different objects such as variables, functions, types, etc.

Control Flow Graph • A Control Flow Graph (CFG) represents all paths through a program control could travers during its execution. In a control flow graph each node represents a basic block (a straight-line piece of code without any jumps or jump targets). Jump target starts a block, and jump ends a block. Directed edges are used to represent jumps ofthe control. There are two specially designated blocks: the entry block, through which control enters into the flow graph, and the exit block, through which all control flow leaves. • The CFG is essential to many compiler optimizations.

CFG example Entry Sum=0; i=1; L12: if (i<11) • int main() { • int sum=0; • int i=1; • while (i<11) { • sum=sum+i; • i = i+1; • } • printf(“%d\n”,sum); • } sum = sum+i; i = i+1; Goto L12 printf(..) Struct BBLOCK { STMT first_stmt STMT last_stmt BBLOCK_LIST pred_list BBLOCK_LIST succ_list … } Return

Source files FE (C++/C orFortran) Two pass and single pass compilation scheme Internal representation Profiler Temporary files or object files with IR Scalar optimizations Loop optimizations Interprocedural optimizations -Qipo/-Qip Code generation Scalar optimizations Object files Code generation Loop optimizations Executable file of library

Scalar optimizations There are well-known scalar optimizations such as constant folding, constant propagation and copy propagation. • Constant folding is a process of calculating aconstants at compile time. Constant propagation is substitution of variables with known constant values by these values in the expression. int x = 14; int y = 7 - x/2; int x = 14; int y = 7 - 14/2; int x = 14; int y = 0; Constant propagation Constant folding Copy propagation is substitution of variables by their values. y = x; z = 3+y; y = x; z = 3+x; Copy propagation

Common subexpressions elimination • Search for identical subexpressions and saving the calculation result in a temporary variable for later reuse. a = b * c + g; d = b * c * d; tmp = b * c; a = tmp + g; d = tmp * d; CSE

Dead code elimination • Removal of code that does not change the output of the program. intfoo() { int a = 24; int b = 25; int c; if(a<0) printf(«a<0 »); c = a << 2; return c; } intfoo() { int a = 24; int c; c = a << 2; return c; } Dead code elimination There are many cases when dead code can appear. It can be the result of scalar optimizations, inlining, etc.

Removal of excessive branching, broaching conditions • Sometimes conditional branches can be deleted because of previous conditions if(x>0) { … if(x>0) { a=x; } else { a=-x; } … } if(x>0) { … a=x; … } Condition propagation

Why Control Flow Graph is important for scalar optimizations? X = C1; L = X; Y = X; X = C2; Z = X; IF(X>C1) When we can propagate the information about the values of X? For straight-line piece of code the answer is trivial. CFG resolves ambiguity.

Data Flow analysis • Data Flow Analysis is a technique for gathering information about a possible set of values for each variable calculated at various points of a program. Control flow graph (CFG) is used to identify those parts of the program in which a certain value is assigned to a variable can be propagated. • A definition-use graph is a graph that contains the edges from each variable definition point in the program to every point of its use.

Construction of def-use chain for the base block is trivial. Each variable definition is associated with all subsequent uses of it. Each subsequent redefinition stops and starts a new chain. • In order to use this local graph CFG computed using several sets those characterize the behavior of the block: • Uses (b): A set of variables used in the block, but have no definitions within the block. • Defsout(b): A set of definitions that have been made in b, and reached the end of the block. • Killed(b): A set of definitions that were canceled within a block by other definitions. • Reaches(b): The set of all definitions made in other units, including b, which can reach b.

To understand what definition will be used in our basic block, it is important to know reaches (b). • It can be constructed via an iterative process that will calculate the reaches (b) through the sets of previous blocks.Reaches (b) = U for all predecessors (defsout (p) U (reaches (p) ∩ ¬ killed (p)) • The problem is that in the presence of loops, the set reaches(b) may depend on the reaches (b). If we will repeat this equation many times for each basic block CFG – final decision can be get.

Constructed sets are used for many scalar optimizations such as dead code elimination, constant propagation and etc. The main problem of this approach is a large number of edges in the Def-Use graph and a great time for calculation of these sets. As result a lot of resources are needed for processing. S1 X= S2 X= S3 X= This example illustrates the problem. Definitions of S1, S2, S3 pass through the top of S4. Since each definition reaches every use, there are nine edges. Static single assignment form (SSA) was proposed to simplify DEF/USE chain. S4 S7 =X S5 =X S6 =X

SSA (Static single assignment form) • SSA form proposes unique name for each variable definition and introduction of special pseudo-assignments. S2 S1 S3 X1= X2= X3= X4=φ(X1,X2,X3) S4 S5 S7 S6 =X4 =X4 =X4

SSA is designed to save developers from building complex use / def chains for local variables. Power of SSA is that each variable has only one definition in the program. Therefore, use / def chain is trivial. • SSA introduces special presentation of Phi-functions in places with uncertainty, to create a new variable. This so-called pseudo-assignment.In the construction is necessary to place Phi - functions and create new unique variables. • The new variables are generated by completing the variable name with a unique option.In order to correctly insert the Phi function is necessary to consider some of the concepts of graph theory.

Node N dominates node M if all ways to M pass through N.A node is an immediate dominator of node M if it is the last dominator on any path from entry node to M. Dominance frontier of node x is set w of all nodes wherex dominates all predecessors nodes from w, but doesn’t dominates nodes from w. Example: Dom[5] = {5,6,7,8} DF[5] ={5,4,12,11} 1 2 5 5 9 3 7 6 6 7 10 4 11 8 8 11 12

In SSA form, each variable definition must dominate the use of this variable. • Construction of the dominators set for each basic block can be the following: • The set of dominators for a node N is the intersection of the dominators set of all his predecessors, and the node itself. • Strict dominator N, this dominator!= N. Immediate dominator – the closest node from the set of dominators.idom (N) - the immediate dominator for basic block Nchildren (N) - the set of basic blocks for N, which it dominates 2 3 4 5 6

Criterion of dominance frontier: if the basic block N contains a definition of variable A , then every node on the dominance frontier of node N requires Phi function for A. Each Phi function is also the definition, so you must apply the criterion while there are nodes which requires Phi function. B=A A=x A_2=φ(A_1,A_3)A_ B=A_2 3=x • Inserting φ functions for the node 5 of the scheme on slide 25 .

Optimization using the SSA form: • Dead code elimination • If the variable a_veris not used than it should be removed. • Constant propagation • If there is an assignment a_ver = const, then all of a_vershould be replaced by const • If there is a φ-function a_next = φ (c, c) than φ should be replaced by c. • Copy propagation • If there is an assignment a_n = b_k than all usages of a_n should be replaced with b_k. • If there is an assignment a_n = φ (b_k, b_k) than φ should be replaced with b_k.

Thank you!

Optimizing Compiler . Scalar optimizations .