1 / 35

Introduction to Compiler Design

Introduction to Compiler Design. What is a Compiler?. A compiler is a language translator that takes as input a program written in a high level language and produces an equivalent program in a low-level language.

cherie
Download Presentation

Introduction to Compiler Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Compiler Design

  2. What is a Compiler? • A compiler is a language translator that takes as input a program written in a high level language and produces an equivalent program in a low-level language. • For example, a compiler may translate a C++ program into an executable program running on a processor.

  3. Phases of Compilation • In the process of translation, a compiler goes through several phases: • Lexical analysis (also called scanning) • Syntax analysis (also called parsing) • Semantic analysis • Optimization (not in this course!) • Code generation

  4. Phases of A Compiler Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Target Program Source Program • Each phase transforms the source program from one representation • into another representation. • They communicate with error handlers. • They communicate with the symbol table.

  5. Lexical Analysis • The job of the lexical analyzer, or scanner, is to read the source program one character at a time and produce as output a stream of tokens. • The tokens produced by the scanner serve as input to the next phase, the parser. • Thus, the lexical analyzer’s job is to translate the source program into a form more conducive to recognition by the parser.

  6. Tokens • Takes a stream of characters and identifies tokens from the lexemes. • Tokens are used to represent low-level program units such as • Identifiers, such as sum, value, and x • Numeric literals, such as 123 and 1.35e06 • Operators, such as +, *, &&, <=, and % • Keywords, such as if, else, and return • Many other language symbols • Eliminates comments and redundant whitepaces.

  7. Tokens • Whitespace: A sequence of space, tab, newline, carriage-return, form-feed characters etc. • Lexeme: A sequence of non-whitespace characters delimited by whitespace or special characters (e.g. operators like +, -, *). • Examples of lexemes. • reserved words, keywords, identifiers etc. • Each comment is usually a single lexeme • preprocessor directives

  8. Classes of Tokens • There are many ways we could represent the tokens of a programming language. One possibility is to use a 2-tuple of the form <token_class, value> • For example, consider the token class identifier. The identifiers sum and value may be represented as <ident, “sum”>and <ident, “value”>, respectively.

  9. Classes of Tokens • The token class NumericLiteral may be represented in the same way; for example, the literals 123 and 1.35e06 may be represented as <NumericLiteral, “123”>and <NumericLiteral, “1.35e06”>, respectively. • The same applies to operators; for example, <relop, “>=”>and <addop, “-”>

  10. Representing Tokens • These 2-tuples are easily represented as a struct or class in C++: enum TokenClass {ident, numlit,…}; struct Token { TokenClass tokenClass; string tokenValue; };

  11. Tokens: An Example The scanner may take the expression x = 2 + f(3); and produce the following stream of tokens: <ident, “x”> <assign_op, “=”> <numlit, “2”> <addop, “+”> <ident, “f”> <lparen, “(“> <numlit, “3”> <rparen, “)”> <semicolon, “;”>

  12. The Scanner in Action if (x < y) min = x; SCANNER <SEMICOLON, “;”>  <IDENT, “x”>    <LPAREN, “(“>  <IFTOK, “if”>

  13. Syntax Analysis • The job of the syntax analyzer, or parser, is to take a stream of tokens produced by the lexical analyzer and build a parse tree (or syntax tree). • The parser is basically a program that determines if sentences in a language are constructed properly according to the rules of the language.

  14. A Parse Tree if_statement if ( expr ) statement assign relop numlit ident var = expr x < 10 ident numlit y 23

  15. Parsing Techniques • Depending on how the parse tree is created, there are different parsing techniques. • These parsing techniques are categorized into two groups: • Top-Down Parsing, • Bottom-Up Parsing • Top-Down Parsing: • Construction of the parse tree starts at the root, and proceeds towards the leaves. • Efficient top-down parsers can be easily constructed by hand. • Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing). • Bottom-Up Parsing: • Construction of the parse tree starts at the leaves, and proceeds towards the root. • Normally efficient bottom-up parsers are created with the help of some software tools. • Bottom-up parsing is also known as shift-reduce parsing. • Operator-Precedence Parsing – simple, restrictive, easy to implement • LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR

  16. Syntax Analysis • The syntax of a language is defined by using a context free grammar (CFG). • A CFG uses BNF rules to describe the syntax: <if-stmt>  if(<cond> ) <stmt> [ else stmt ] ;

  17. Syntax Analyzer (CFG) • The syntax of a language is specified by a context free grammar (CFG). • The rules in a CFG are mostly recursive. • A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not. • If it satisfies, the syntax analyzer creates a parse tree for the given program. • Ex: We use BNF (Backus Naur Form) to specify a CFG assgstmt -> identifier := expression expression -> identifier expression -> number expression -> expression + expression

  18. Syntax Analyzer versus Lexical Analyzer • Which constructs of a program should be recognized by the lexical analyzer, and which ones by the syntax analyzer? • Both of them do similar things; But the lexical analyzer deals with simple non-recursive constructs of the language. • The syntax analyzer deals with recursive constructs of the language. • The lexical analyzer simplifies the job of the syntax analyzer. • The lexical analyzer recognizes the smallest meaningful units (tokens) in a source program. • The syntax analyzer works on the smallest meaningful units (tokens) in a source program to recognize meaningful structures in our programming language.

  19. Semantic Analyzer • A semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. • Type-checking is an important part of semantic analyzer. • Normally semantic information cannot be represented by a context-free language used in syntax analyzers. • The semantic analyzer’s job is to attach some meaning to the structure produced by the parser. Activities include: • Ensuring an identifier is defined before being used in a statement or expression. • Enforcing the scope rules of the language. • Performing type checking • Producing intermediate code

  20. Semantic Analysis • Static semanticscan be determined by the compiler prior to execution, including • Declarations • Determine the structure and attributes of a user-defined data type • Determine type of a variable • Determine the number and types of parameters of a procedure • Type checking • The process of ensuring that the type(s) of the operand(s) are appropriate for an operation

  21. Semantic Analysis • Attributesare extra pieces of information computed by the semantic analyzer. These include the types of variables, constants, operators, etc. • An annotated syntax tree is a syntax tree that has been “decorated” with attributes. • Inherited attributes come down the syntax tree from parent or sibling nodes • Synthesized attributes come up the syntax tree from child nodes

  22. Annotated syntax tree: a[ndx] = x + 3 Semantic Analysis Assign-expr lhs.type=rhs.type Add-operator Subscript-expr integer integer Identifier a Identifier ndx Identifier x Number 3 integer integer integer integer

  23. Semantic Analysis • Some optimization may be done during this phase: • Source code optimization (e.g., constant folding): • X := 2 + 4; can be optimized to X := 6; • Intermediate code optimization: • Temp := 5; A[index] := Temp can be optimized to A[index] := 5;

  24. Intermediate Code Generation • A compiler may produce an explicit intermediate codes representing the source program. • These intermediate codes are generally machine (architecture independent). But the level of intermediate codes is close to the level of machine codes. • Ex: newval := oldval * fact + 1 id1 := id2 * id3 + 1 MULT id2,id3,temp1 Intermediates Codes (Quadraples) ADD temp1,#1,temp2 MOV temp2,,id1

  25. Code Optimizer (for Intermediate Code Generator) • The code optimizer optimizes the code produced by the intermediate code generator in the terms of time and space. • Ex: MULT id2,id3,temp1 ADD temp1,#1,id1

  26. Code Generation • Takes intermediate code and produces object code: la $t0, _a addi $sp, $sp, 4 sw $t0, 0($sp) la $t0, _ndx lw $t0, 0($t0) . . .

  27. Compiler Data Structures • Tokens • Often represented as a enumeration type • May include other information such as • Spelling of identifier • Value of constant • Scanner needs to generate only one token at a time (single symbol lookahead)

  28. Compiler Data Structures • Parse tree (or syntax tree) • A linked structure built by the parser, with information added by the semantic analyzer • Each node is a record whose fields contain information about the syntactic construct which the node represents • Nodes may be represented in various ways: • A C structure • A Pascal or Ada variant record • A C++ or Java class

  29. Compiler Data Structures • Symbol Table • Keeps information about identifiers declared in a program. • Efficient insertion and lookup is required  hash table or tree structure may be used. • Several tables may be maintained in a list or stack.

  30. Compiler Data Structures • Literal Table • Stores constants and strings used in a program. • Data in literal table applies globally to a program  deletions are not necessary

  31. Compiler Data Structures • Intermediate Code • Could be kept in an array, temporary file, or linked list. • Representations include P-code and 3-address code

  32. The Structure of a Compiler (8) Code Generator [Intermediate Code Generator] Non-optimized Intermediate Code Scanner [Lexical Analyzer] Tokens Code Optimizer Parser [Syntax Analyzer] Optimized Intermediate Code Parse tree Code Optimizer Semantic Process [Semantic analyzer] Target machine code Abstract Syntax Tree w/ Attributes

  33. The Structure of a Compiler • Compiler writing tools • Compiler generators or compiler-compilers • E.g. scanner and parser generators • Examples : Yacc, Lex

More Related