210 likes | 390 Views
Modules in a Compiler. Read Pages 39-52. Module - a group of code that performs a specific task Compiler can be thought of in two distinct parts Front end – the modules associated with a specific source language Back end – the modules associated with an instruction set.
E N D
Modules in a Compiler Read Pages 39-52
Module - a group of code that performs a specific task • Compiler can be thought of in two distinct parts • Front end – the modules associated with a specific source language • Back end – the modules associated with an instruction set
Modules in a compiler Source Scanner tokens Parser Program (lexical analysis) (syntax analysis) syntax tree Parse Table Code Target (symbol table)Generator Program
Modules in a compiler • Scanner – considers source code as a series of ascii characters • Lexeme – a string of characters that mean something when put together • Token – a symbol that represents a lexeme • Built on the concepts of a RL • Symbol Table – an add on to get outside of CFL
Modules in a compiler • Parser – determines if a group of tokens forms a semantically correct “sentence” / “program” • Built on the concepts of CFL • Code Generator – considers the semantics from the parser and generates semantically equivalent target code • What is a CFL?
What it looks like. • Consider the simple program int main() { int x, j; float a ; x= 7; a=5.2; j= 3; if( j < x) a= 4.6; } http://jhelum.cs.iastate.edu/cgi-bin/comp.cgi Homework: Lab 1
Context Free Grammars • CFG • a set of terminals (tokens) corresponding to characters in Σ • a set of non-terminals • a special non-terminal called the start state • a list of productions Э the LHS of each production is a single non-terminal, then an arrow and a sequence of tokens and/or terminals (RHS)
Derivations from CFGs • A grammar derives strings by beginning with the start symbol and repeatedly replacing non-terminals by the body of the production for the given non-terminal • The strings that can be derived from the start symbol form the language defined by the grammar, the CFL
CFG Examples • Convention is to have Capitals for non-terminals, lowercase terminals S Aa|Bb A Aa|Λ B bB|b • Which of the following are accepted by the grammar • aaa? • bbb? • abba? • CFG example • Find CFG for anbn
Homework- Create a high-level programming language • Read pages 25 -36 and address each aspect in creating your language. • Your language will have the following • basic constructs – case sensitive? Blocking?... • Operators – add, subtract, multiply, divide, modulus, unary, comparison, logical, assignment… • Control structures – if-else, case, for loop, while loop, repeat until loop, • Functions – declaration, call, return values • Variables – integer, real, character, string, pointer, array, void, Boolean • Input / Output • Comments
Converting between RE and CFG • Any RE can be represented by a CFG R.E. CFG • Λ S Λ • a S a • (r1 ) S s1 (s1 is a start state for the grammar r1 ) • r1r2 S s1 s2 • r1+r2 S s1 | s2 • r1* S S s1 | Λ
Example • (aa+b)*a convert to CFG • Give a CFG for a palindrome • Homework page 51 #2.2.1, 2.2.2 a) b) c)
Backus-Naur Form • BNF – a notation for a context free grammar • terminal are tokens • nonterminals are surrounded by <> • productions are ::= - “can be” • pipe(|) used for OR • ex. • <program> ::= <declaration section><body>; • <real> ::= <integer>.<fractional><integer> ::= <digit>|<integer><digit>|ε<fraction> ::= <digit>|<fractional><digit><digit> ::= 0|1|2|3|4|5|6|7|8|b • <expr> ::= <expr><op><expr>|id<op> ::= +|-|*|/
Backus-Naur Form examples • Convert the REs to CFL • b(a+b)*a • (aa+bb+ba+ab)* • Λ+a+(b+aa)(a+b)*
Parse Trees • productions are rules for building strings of tokens • parse tree shows how the strings can be built • a parse tree with respect to a grammar is a tree satisfying the following • the root is the starting node • each leaf is either a terminal or ε • each non-leaf node is labeled with a non-terminal • label of a non-leaf node is the left side of a production and the labels of the children of the node represent the right hand side of the production • parse tree plus the symbol table capture the semantics of the language
Parse Tree Example: x+y*z(using previous production example) <expr> <expr> <op> <expr> <expr> <op> <expr> * id id + id
Example Continued: x+y*z <expr> <expr> <op> <expr> id + <expr> <op> <expr> id * id Homework: Lab 2
Ambiguity • Syntactically Ambiguous – if the same string has more than one parse tree • To fix ambiguity create productions to rule out all but one of the trees
Ambiguity Removed <expr> ::= <term><op1><expr>|<term><term> ::= <factor><op2><term>|<factor><factor> ::= id<op1> ::= +|-<op2> ::= *|/ • Create the tree for x+y*z • Lexical analysis (scanner) – will take x+y*z & give id+id*id then syntax analysis (parser) uses the above to determine if it is a “legal” structure of our language
scanner – built on regular language concepts • takes in lexemes • return tokens • Example list of tokens { - beginToken } - endToken ; - semiToken forToken, idToken, lessthanToken
Homework • create transition graph for each of the literals • stringLiteral, intLiteral, realLiteral, charLiteral, hexLiteral – 0x….. (0-9,A-F), boolLiteral • create a complete set of tokens for the language • create a complete set of reserved words for the language