1 / 27

Compilers

Compilers. Syntactic/semantic structure. tokens. Syntactic structure. Scanner (lexical analysis). Parser (syntax analysis). Semantic Analysis (IC generator). Code Generator. Source language. Machine language. Code Optimizer. Input Data. Computer. Symbol Table. Output.

Download Presentation

Compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compilers Syntactic/semantic structure tokens Syntactic structure Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Source language Machine language Code Optimizer Input Data Computer Symbol Table Output

  2. Interpreters Interpreter Source language Output Input Data

  3. Hybrid tokens Syntactic structure Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Source language Input Data Intermediate Code Interpreter Symbol Table Output

  4. Source Program if (a >= b+1) { a *= 2; } … Lexical Analysis Front End (analysis) Syntax Analysis Semantic Analysis Intermediate Code Gen The Compilation Process _t1 = b + 1 _t2 = a < _t1 If _t2 go to L0 … Intermediate Representation IR Optimization Back End (synthesis) Object Code Gen Object Code Optimization lw $t1, -16($fp) Add $t0, $t1, 1 … Target Program

  5. The Analysis Stage • Broken up into four phases • Lexical Analysis (also called scanning or tokenization) • Parsing • Semantic Analysis • Intermediate Code Generation

  6. Lexical Analysis and Scanners/Lexers • Lexical analysis is the first phase of compilation where the compiler attempts to recognize the symbols of the actual source code • Lexical analyzers also called scanners/lexers are usually subroutines or coroutines of the parser. • The parser will ask for the next token from the source file and the lexer will return that token.

  7. Lexing Example double d1; double d2; d2 = d1 * 2.0; double TOK_DOUBLE reserved word d1 TOK_ID variable name ; TOK_PUNCT has value of “;” double TOK_DOUBLE reserved word d2 TOK_ID variable name ; TOK_PUNCT has value of “;” d2 TOK_ID variable name = TOK_OPER has value of “=” d1 TOK_ID variable name * TOK_OPER has value of “*” 2.0 TOK_FLOAT_CONST has value of 2.0 ; TOK_PUNCT has value of “;” lexemes

  8. Lexical Analysis – sequences • Expression • Base * base - 0x4 * height * width • Token sequence • Name:Base operator:times name:base operator:minus hexConstant:4 operator:times name:height operator:times name:width • Lexical phase returns token and value

  9. Tokens and lexemes • Lexers work with patterns, tokens, and lexemes. • Patterns formally describe tokens in some way. • Tokens are the terminal symbols in the grammar for the language. • Lexemes are the actual strings that match the patterns

  10. Token Lexeme Pattern Description int int int identifier MyVar letter followed by digits or letters literal ``foo'' characters enclosed in quotes Example

  11. Expressing Patterns for Tokens • As you may have already guessed (or know), the easiest way to specify a token is with a regular expression.

  12. Regexs • Regular expressions (regexs) are used to describe (regular) languages. Here are the rules of regular expressions: • The empty string, , is a regular expression (e) • A symbol is a regular expression (e.g., a) • If R and S are regexs, then so is • R|S (denoting RorS) • RS (concatenation) • R* (zero or more of R) • (R) (grouping)

  13. Regex Conventions • There are various conventions used in the world of regular expressions to make things a bit easier. • R+ (one or more of R) • R? (zero or one of R) • [a-z], [A-Z], [0-9] (character classes) • . - any single character/symbol • Precedence rules for operators to avoid excessive parenthesis. • All operators group left-to-right. • *, + and ? have highest • concatenation is second highest • | is the lowest

  14. Examples • a...b - five letter words starting with a and ending with b • a*(bb)*a* - words with an even number of b's. • .*(ing|er)s? - words ending with ing or er, with zero or one s • [0-9]+\.[0-9]+(e|E)-?[0-9](l|L|f|F)? - simplified version of floating point constants in C (the backslash (\) means ``take the next character literally'') • (R|)* - equivalent to R*

  15. Another Example Expression -> Expression + Expression | Expression - Expression | ... Variable | Constant | ... Variable -> T_IDENTIFIER Constant -> T_INTCONSTANT | T_DOUBLECONSTANT

  16. The Parse a + 2 Expression -> Expression + Expression -> Variable + Expression -> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT

  17. Semantic Analysis • The syntactically correct parse tree (or derivation) is checked for semantic errors • Check for constructs that while valid syntax do not obey the semantic rules of the source language. • Examples: • Use of an undeclared/un-initialized variable • Function called with improper arguments • Incompatible operands and type mismatches,

  18. Examples void fun1(int i); double d; d = fun1(2.1); int i; int j; i = i + 2; int arr[2], c; c = arr * 10; Most semantic analysis pertains to the checking of types.

  19. Intermediate Code Generation • Where the intermediate representation of the source program is created. • The representation can have a variety of forms, but a common one is called three-address code (TAC) • Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.

  20. Example _t1 = b * c _t2 = b * d _t3 = _t1 + _t2 a = _t3 a = b * c + b * d Note temps

  21. Another Example _t1 = a > b if _t1 goto L0 _t2 = a - c a = _t2 L0: t3 = b * c c = _t3 if (a <= b) a = a - c; c = b * c; Note Temps Symbolic addresses

  22. Backend (Synthesis) • Basic Steps • Intermediate Code optimization • Object Code Generation • Object Code Optimization • Synthesis is not as deterministic/predictable as analysis. Thus, synthesis must be conservative and this is why optimizing can be lengthy and not ``perfect''.

  23. Intermediate Code optimization • Input is IR, output is optimized IR • What are some of the optimizations that can be performed? • Algebraic simplifications (*1,/1,*0, factoring, etc) • Moving invariant code out of loops • Removal of isolated code and unused variables • Removing variables that are not used

  24. IR Optimization • Optimizations take place with IR and when manipulating actual machine code. • However the optimizations done at the IR stage can be done to any program, regardless of architecture. • The optimizations done with machine/object code usually exploit some feature of the target architecture in some way • What’s this say about a JITC approach?

  25. Example _t1 = b * c _t2 = _t1 + 0 _t3 = b * c _t4 = _t2 + _t3 a = _t4 _t1 = b * c _t2 = _t1 + t1 a = _t2

  26. Object Code Generation • The output of this stage is machine or assembly code • Variables get mapped to memory locations (Variables are just a shorthand for that anyway) • Actual machine instructions are swapped for symbolic ones

  27. Object Code Optimization • May follow code generation • Optional – only on demand • Variable • Like IR Optimization may be expensive • Levels • Exploits machine detail • Examples: • Register pools • Instruction Pipelining

More Related