280 likes | 392 Views
Introduction to Compilers. CMSC 431 Shon Vick 01/28/02. What is a compiler?. Translates source code to target code Source code is typically a high level programming language (Java, C++, etc) but does not have to be
E N D
Introduction to Compilers CMSC 431 Shon Vick 01/28/02
What is a compiler? • Translates source code to target code • Source code is typically a high level programming language (Java, C++, etc) but does not have to be • Target code is often a low level language like assembly or machine code but does not have to be • Can you think of other compilers that you have used – according to this definition?
Other Compilers • Javadoc -> HTML • SQL Query output -> Table • Poscript -> PDF • High level description of a circuit -> machine instructions to fabricate circuit
The analysis Stage • Broken up into four phases • Lexical Analysis (also called scanning or tokenization) • Parsing • Semantic Analysis • Intermediate Code Generation
Lexing Example double d1; double d2; d2 = d1 * 2.0; double TOK_DOUBLE reserved word d1 TOK_ID variable name ; TOK_PUNCT has value of “;” double TOK_DOUBLE reserved word d2 TOK_ID variable name ; TOK_PUNCT has value of “;” d2 TOK_ID variable name = TOK_OPER has value of “=” d1 TOK_ID variable name * TOK_OPER has value of “*” 2.0 TOK_FLOAT_CONST has value of 2.0 ; TOK_PUNCT has value of “;” lexemes
Syntax and Semantics • Syntax - the form or structure of the expressions – whether an expression is well formed • Semantics – the meaning of an expression
Syntactic Structure • Syntax almost always expressed using some variant of a notation called a context-free grammar (CFG) or simply grammar • BNF • EBNF
A CFG has 4 parts • A set of tokens (lexemes), known as terminal symbols • A set of non-terminals • A set of rules (productions) where each production consists of a left-hand side (LHS) and a right-hand side (RHS) The LHS is a non-terminal and the RHS is a sequence of terminals and/or non-terminal symbols. • A special non-terminal symbol designated as the start symbol
An example of BNF syntax for real numbers <r> ::= <ds> . <ds> <ds> ::= <d> | <d> <ds> <d> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7| 8 | 9 < > encloses non-terminal symbols ::= 'is' or 'is made up of ' or 'derives' (sometimes denoted with an arrow ->) | or
Example • On the example from the previous slide: • What are the tokens? • What are the lexemes? • What are the non terminals? • What are the productions?
BNF Points • A non terminal can have more than RHS or an OR can be used • Lists or sequences are expressed via recursion • A derivation is just a repeated set of production (rule) applications • Examples
Example Grammar <program> -> <stmts> <stmts> -> <stmt> | <stmt> ; <stmts> <stmt> -> <var> = <expr> <var> -> a | b | c | d <expr> -> <term> + <term> | <term> - <term> <term> -> <var> | const
Example Derivation <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
Parse Trees • Alternative representation for a derivation • Example parse tree for the previous example stmts stmt expr var = term term + a var const b
Another Example Expression -> Expression + Expression | Expression - Expression | ... Variable | Constant | ... Variable -> T_IDENTIFIER Constant -> T_INTCONSTANT | T_DOUBLECONSTANT
The Parse a + 2 Expression -> Expression + Expression -> Variable + Expression -> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT
Parse Trees PS -> P | P PS P -> e | '(' PS ')' | '<' PS '>' | '[' PS ']' What’s the parse tree for this statement ? < [ ] [ < > ] >
EBNF - Extended BNF • Like BNF except that • Non-terminals start w/ uppercase • Parens are used for grouping terminals • Braces {} represent zero or more occurrences (iteration ) • Brackets [] represent an optional construct , that is a construct that appears either once or not at all.
EBNF example Exp -> Term { ('+' | '-') Term } Term -> Factor { ('*' | '/') Factor } Factor -> '(' Exp ')' | variable | constant
EBNF/BNF • EBNF and BNF are equivalent • How can {} be expressed in BNF? • How can ( ) be expressed? • How can [ ] be expressed?
Semantic Analysis • The syntactically correct parse tree (or derivation) is checked for semantic errors • Check for constructs that while valid syntax do not obey the semantic rules of the source language. • Examples: • Use of an undeclared/un-initialized variable • Function called with improper arguments • Incompatible operands and type mismatches,
Examples void fun1(int i); double d; d = fun1(2.1); int i; int j; i = i + 2; int arr[2], c; c = arr * 10; Most semantic analysis pertains to the checking of types.
Intermediate Code Generation • Where the intermediate representation of the source program is created. • The representation can have a variety of forms, but a common one is called three-address code (TAC) • Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.
Example _t1 = b * c _t2 = b * d _t3 = _t1 + _t2 a = _t3 a = b * c + b * d Note temps
Another Example _t1 = a > b if _t1 goto L0 _t2 = a - c a = _t2 L0: t3 = b * c c = _t3 if (a <= b) a = a - c; c = b * c; Note Temps Symbolic addresses
Next Time • Finish introduction to compilation stages • Read Aho/Sethi/Ullman Chapter 1
Selected References • Compilers Principles, Techniques and Tools, Aho, Sethi, and Ullman • http://www.stanford.edu/class/cs143/