350 likes | 362 Views
Overview. Functionality of LANCE Software structure C frontend Intermediate representation (IR) IR optimizations Control and data flow analysis Backend interface. The LANCE V2.0 compiler system. Purpose of LANCE: Facilitate C compiler development for new target processors
E N D
Overview • Functionality of LANCE • Software structure • C frontend • Intermediate representation (IR) • IR optimizations • Control and data flow analysis • Backend interface
The LANCE V2.0 compiler system • Purpose of LANCE: • Facilitate C compiler development for new target processors • Give insight into compiler structure • Tasks covered by LANCE: • Source code analysis • Generation of IR • Machine-independent optimizations • Data flow graph generation • Tasks not covered by LANCE: • Assembly code generation (backend) • Machine-specific optimizations • Code assembly and linking
Key features • Full ANSI C coverage (C 89) • Modular tool and library structure • Simple three address code IR (C subset) • Plug & play IR optimizations • Backend interface compatible to OLIVE • Proven in numerous compiler projects
header file lance2.h C++ library liblance2.a LANCE software structure LANCE library LANCE tools C frontend IR optimization 1 common IR used by IR optimization n machine- specific backend
ANSI C frontend • Functionality: • Lexical, syntactical, and semantical analysis of C source • Generation of three address code IR for a C file • Emission of error messages if required (gcc style) • Machine-specific constants (type bitwidth, alignment) stored in a configuration file • Implementation: • Based on a context-free C grammar, according to K&R spec • C source automatically generated with attribute grammar compiling system (OX, extension of lex & yacc) • In total approx. 26,000 lines of C source code • Validated with comprehensive test suite
config.sparc Setup and IR generation file test.ir.c • Environment variables: • setenv LANCE2_CPP „gcc –E“ • setenv LANCE2_CONFIG „config.sparc“ Call C frontend by „compile“ command: file test.c >compile test.c
General IR format • One IR file (*.ir.c) generated for each C source file (*.c) • External IR format: C subset (compilable !) • Internal IR format: Accessible via LANCE library • IR contains a symbol table + three address code (3AC) for each C function defined in the source code • 3AC is a sequence of IR statements • 3AC = at most two operands, one result per statement • IR statements (mostly) consist of IR expressions • blocks of 3AC augmented with source information (C code, source line no.) for debugging purposes
Classes of IR statements • Assignment: a = b + c; *p = !a; x = f(y,z); cond = *x; • Jump: goto lab; • Conditional jump: if (cond) goto lab; • Label: lab: • Return void: return; • Return value: return x;
Classes of IR expressions • Symbol: „a“, „b“, „main“, „count“, ... • Binary expression: a * b, x / 2, 3 ^ v, f &4, q % r, ... • Unary expression: !a, *p, ~x, -z, ... • Function call: f1(), f2(a,b), f3(*x, 1, y), ... • Type cast: (char)z, (int)a, (float*)b, ... • String constant: „compiler“, „design“, „is“, „fun“, ... • Integer constant: 1000, 3456, -234, -112, ... • Float constant: „3.1415926536“, „2.718281828459“, ...
Why is the LANCE IR a C subset ? Validation of frontend (or any IR optimization): frontend IR-C source C source CC exe 1 test input exe 2 CC = ? output 1 output 2 C-to-C optimization: optimized C source IR optimization tools CC
IR data structure overview function list IR statement list fun 1 „name1“ stm 2 stm 1 stm m .......... Class: cond. jump ID: 4124 Target: „L1“ Condition: c Class: assignment ID: 4123 Left hand side: *p Right hand side: a + b ... stm info fun n „name n“ IR expression Class: binary ID: 10034 Left arg: a Right arg: b Oper: + Type: int Local symbol table int a,b,c; ... GLOBAL SYMBOL TABLE int x1,x2,x3; double y1,y2,y3; ........ exp info
The IR type class • C++ class IRType stores type info for all symbols and expressions • Primary type: void, char, short, int, array, pointer, struct, function, ... • Secondary type: subtype of arrays and pointers • Storage class: extern, static, register, ... • Qualifiers: const, volatile • Example: const int* A[100]; Type->Class() = IRTYPE_ARRAY // primary type Type->IsConst() = true Type->Subtype()->Class() = IRTYPE_POINTER Type->Subtype()->Subtype()->Class() = IRTYPE_INT Type->ArrayDim() = 100 Type->SizeOf() = 400 // in bytes, for 32-bit pointers Type->MemoryWords() = 200 // for a 16-bit word memory
The symbol table class • Symbol table stores all relevant information for symbols/identifiers • Two hierarchy levels: • Global symbol table IR->GlobalSymbolTable() • One local symbol table per function fun->LocalSymbolTable() • All local symbols get a unique numerical suffix, e.g. int f(int x) { int a,b; } int f(int x_1) { int a_2, b_3; } • Important access methods: • ST->LookupSymbol(char* name) • IRSymbol* ST->CreateSymbol(IRType* tp) • Iterators: ST->FirstObject(), ST->NextObject() • Information stored in a table entry (class IRSymbol): • Symbol type: IRType* sym->Type() • Symbol name: char* sym->Name()
IR generation example forward declaration automatic conversion suffix 3 for parameter i auxiliary vars debug info source file IR file
IR optimization tools • Purpose: perform machine-independent optimizations on IR • Identical IR format for all tools, „plug & play“ concept • Currently available tools: • Constant folding cfold tool • Constant propagation constprop tool • Copy propagation copyprop tool • Common subexpression elimination cse tool • Dead code elimination dce tool • Jump optimization jmpopt tool • Loop invariant code motion licm tool • Induction variable elimination ive tool • Automatic iteration of IR optimizations via „iropt“ shell script
IR optimization example C source code compile unoptimized IR
Constant folding cfold
Constant propagation constprop
Copy propagation copyprop
Jump optimization jmpopt
Control flow analysis • Purpose: identify basic block structure of a C function • Basic block (BB): IR statement sequence with unique entry and exit points • Control flow graph (CFG): One node per BB, edge (BB1, BB2) iff BB2 may be an immediate successor of BB1 during execution • Assembly code generation usually done BB after BB • Example: BB1 while (x) { BB1; if (x) then BB2; else BB3; BB4; } BB2 BB3 BB4
CFG generation by LANCE • Class ControlFlowGraph contained in LANCE library • Constructor ControlFlowGraph(Function* fun) generates CFG for any function fun • LANCE tool showcfg exports CFGs in the VCG text format • VCG can be used to visualize generated CFGs xvcg showcfg IR file CFG VCG file
CFG visualization example showcfg + VCG tool
Data flow analysis • Goal: convert IR into data flow graph (DFG) representation for assembly code generation by tree pattern matching • Performed by def/use analysis between IR statements/expressions • LANCE lib class DataFlowAnalysis provides required methods • Constructor DataFlowAnalysis(Function* fun) constructs data flow information for any function fun • Example: x = 5; goto lab; ... x = 6; lab: y = x + 1; ... z = 1 – y; u = y / 5; x has two definitions: x and x y has two uses: y and y
DFG visualization example showdfg + VCG tool
Backend interface • LANCE lib classes LANCEDataFlowTree and DFTManager provide link between LANCE IR and tree pattern matching • OLIVE/IBURG accept only trees instead of general DFGs • Hence: split DFGs at the common subexpressions (CSEs) a b CSE a b auxiliary variable * c 2 * t + + c t t 2 x y + + x y
Data structure overview • Constructor DFTManager(Function* fun) generates data flow tree (DFT) representation for an entire function fun • DFTManager contains internal list of basic blocks • Each BB in turn is a list of DFTs DFT 2 DFT 1 DFT m .......... BB 1 BB 2 ... BB n
DFT covering with OLIVE • DFTs are directly in the format required by code generators produced by OLIVE • All DFTs consist of a fixed set of terminal symbols (e.g. cs_STORE) (specified in file INCL/termlist.c) • Example (only a single DFT): C file DFT representation IR file
Example (cont.) DFT in OLIVE format assembly code for hypothetical machine simplified OLIVE spec
Summary • LANCE provides you with ... • C frontend • IR optimizations • C++ library for IR access (+ important basic classes) • interface to OLIVE data flow trees Full C compiler additionally requires ... • OLIVE based backend for the concrete target machine • target-specific optimizations (e.g. scheduling, address gen.)