240 likes | 443 Views
ICG & Code generation. IR. The many choices for the IR include three-address representations such as: quadruples, triples, indirect triples Virtual machine representations such as: byte codes and stack-machine code; Linear representations such as postfix notation;
E N D
IR • The many choices for the IR include three-address representations such as: quadruples, triples, indirect triples • Virtual machine representations such as: byte codes and stack-machine code; • Linear representations such as postfix notation; • Graphical representations such as syntax trees and DAG's.
Intermediate Language (properties) • It should be easy to translate from a high-level language to the intermediate language. (wide range of different source languages) • It should be easy to translate from the intermediate language to machine code.( wide range of different target architectures) • The intermediate format should be suitable for optimizations.
IR • Postfix notation Its nature allows it to be evaluated with the use of a stack, Operands are pushed onto the stack; operators pop the right amount of operands from the stack, do the operation, then push the result back onto the stack. Used only for simple arithmetic expression. Cannot be used for the expression of most programming languages constructs Eg: a+bab+ a+b*c abc*+
2. Syntax tree • The syntax tree is nothing more than a condensed form of the parse tree. The operator and keyword nodes of the parse tree are moved to their parent, and a chain of single productions is replaced by single link.
Cont… a := b * -c + b * -c;
Cont… x=a*b*c+d+e+f syntax tree?
TAC • Quadruples (operator, argument 1,argument 2,result) • Triples 3. Indirect Triples
Cont… x+y*z convert by using 3 types of TAC
3. Three-Address Code • Three address code is a sequence of statements of the form x = y op z. • A statement involves no more than three references, it is called a "three-address statement," and a sequence of such statements is referred to as three-address code. For example, the three-address code for the expression A + B * C + D is: T1=B*C T2=A+T1 T3=T2+D
Cont… a = b * c + b * d; t1 = b * c; t2 = b * d; t3 = t1 +t2; a = t3;
Code Generation • The code generator produces an object program when an input program is given. • Final phase. • Should preserve the semantic meaning of the source program and be of high quality.
A code generator primary tasks • Instruction selection • Register allocation and assignment • Instruction ordering • Instruction selection involves choosing appropriate target-machine instructions to implement the IR statements. • Register allocation and assignment involves deciding what values to keep in which registers. • Instruction ordering involves deciding in what order to schedule the execution of instructions.
Instruction Selection • The code generator must map the IR program into a code sequence that can be executed by the target machine. • The complexity of performing this mapping is determined by factors such as: • The level of the IR. • The nature of the instruction-set architecture • The desired quality of the generated code.
Register Allocation • A key problem in code generation is deciding what values to hold in what registers. • Registers are the fastest computational unit on the target machine. • The use of registers is often subdivided into two sub-problems: 1. Register allocation 2. Register assignment
Evaluation Order • The order in which computations are performed can affect the efficiency of the target code.
Target Program • The instruction-set architecture of the target machine has a significant impact on the difficulty of constructing a good code generator that produces high-quality machine code. • The most common target-machine architectures are RISC (reduced instruction set computer), CISC (complex instruction set computer), and stack based. • RISC many registers, three-address instructions, simple addressing modes, and a relatively simple instruction-set architecture. • CISC typically has few registers, two-address instructions, a variety of addressing modes.
Basic blocks • Many code generators partition IR instructions into "basic blocks", which consist of sequences of instructions that are always executed together.
Code from DAG • It is a directed graph with no cycles which gives a picture of how the value computed by each statement in a basic block is used in subsequent statements in the block. DAG is also called as “Computation DAG”. • A useful data structure for automatically analyzing basic blocks is a Directed Acyclic Graph (DAG). Constructing a DAG from 3-address statements is a good way of determining common sub-expressions within a block.
Advantages of DAG • we can easily rearrange the order of the final computation sequence of quadruples central to our discussion is the case where the DAG is a tree. • For this case we can generate code that we can prove is optimal under such criteria as program length or the fewest number of temporaries used.
Rearranging the order (A+B)-(E-(C+D))