170 likes | 345 Views
Compilation: Backus-Naur Form (BNF) and Context Free Grammars (CFGs). We care about. Completeness of specification Determination of legal expressions Resolution of ambiguities Avoid small mistakes causing major errors. BNF/CFG. Backus-Naur Form Context Free Grammars
E N D
Compilation: Backus-Naur Form (BNF) and Context Free Grammars (CFGs)
We care about • Completeness of specification • Determination of legal expressions • Resolution of ambiguities • Avoid small mistakes causing major errors
BNF/CFG • Backus-Naur Form • Context Free Grammars • Similar mechanisms for specifying legal syntax • Both focused on production rules
t t t
t t t
Phases of compilation lexical analysis (tokenization) – grouping the characters into tokens (int, {, x, etc) (linear scan) Syntax analysis (parsing) – grouping tokens into expressions or statements ( int x = 10;) (recursion; CFG) semantic analysis (Syntax-directed translation), - type checking, implicit typecasting, check indices to arrays, variables declared before use, etc. Necessary because most programming languages can't be completely captured by CFG code generation (next) – actually generating assembler code code optimization – looking for ways to make the assembler faster
Code generation: Using the parse tree to generate assembler code To prune the leaves of a parse tree means to eliminate all the leaves of a node, replacing the leaves (and their parent) with the intended “meaning” (a value or a chunk of code) After code generation, code optimizer
Parse trees are central to compiler theory • They allow us to identify the production rule corresponding to the chunk of code, and from there replace that chunk of code with a chunk of assembly.
Ambiguity • A grammar that produces more than one derivation for the same sentence is ambiguous • Eg: E E+E | E*E | (E) | -E | id • There are two derivations for id+id*id – find them both.
Ambiguity continued • There are some languages for which there is no unambiguous grammar • HOWEVER there are some rules of thumb we can use to help us deal with many ambiguous grammars • Ambiguity often arises if the right side of a production rule contains 2 or more occurrences of the same non-terminal • Ambiguity can cause problems if the grammar needs to observe rules of precedence or associativity
E -> E+T | T • T -> T*F | F • F -> -P | P • P ->(E) | id
Derivations • A leftmost derivation is one in which only the leftmost non-terminal in a sentential form is replaced at each step. • A rightmost derivation is one in which only the rightmost non-terminal in a sentential form is replaced at each step • Remember ambiguity? Leftmost and rightmost derivations are usually unique • Generally pick leftmost or rightmost and stick with it • if they generate different parse trees the language is ambiguous (not iff) • In general, proving a grammar is unambiguous is undecidable
2*3+(1+2)*4 • E -> E+T | T • T -> T*F | F • F -> -P | P • P ->(E) | id
Compilation vs interpretation An interpreted language is a programming language for which most of its implementations execute instructions directly, without previously compiling a program into machine-language instructions. The interpreter executes the program directly, translating each statement into a sequence of one or more subroutines already compiled into machine code.