130 likes | 287 Views
LANGUAGE TRANSLATORS: WEEK 3. LECTURE: Grammar Theory Introduction to Parsing Parser - Generators TUTORIAL: Questions on grammar theory WEEKLY WORK: Read Chapter 1 of Appel’s Book, read the JavaCup manual, read an introduction to parsing (all online). GRAMMARS.
E N D
LANGUAGE TRANSLATORS: WEEK 3 LECTURE: Grammar Theory Introduction to Parsing Parser - Generators TUTORIAL: Questions on grammar theory WEEKLY WORK: Read Chapter 1 of Appel’s Book, read the JavaCup manual, read an introduction to parsing (all online)
GRAMMARS • Theoretical properties and results about the nature of GRAMMARS have been used for 40 years to influence the design of PARSERS. • A CONTEXT FREE (BNF) GRAMMAR contains a vocabulary = {terminals, non-terminals} and “production rules”/”re-write rules” of the form: non-terminal ::= v1 v2 .........vn where the vi are members of the vocabulary. One non-terminal is called the SPECIAL SYMBOL
PROPERTIES OF GRAMMARS -1 A syntax tree(parse tree) of a BNF Grammar G is a tree where - the root is the special symbol of G, - the leaves are terminals of G, - the nodes are non-terminals of G, - each node is the LHS of some rule P of G; the node’s children are connected to the node by arcs, and form the RHS of P - the leaves are all connected up to the root via arcs
PROPERTIES OF GRAMMARS -2 A string of token is member of the language generated by a BNF grammar G IFF it forms the leaves (in the correct order) of some syntax tree generated by G G is AMBIGUOUS IFF at least one of the strings in its language has more than one distinct syntax tree
PROPERTIES OF GRAMMARS -3 A BNF grammar G is LEFT-RECURSIVE if has at least one production of the form: X ::= X w where X is a non-terminal and w is a string of symbols in G’s vocabulary
PROPERTIES OF GRAMMARS -4 • If w is a list of symbols in the vocabulary of BNF grammar G then First(w) = set of TERMINAL symbols that may be at the front of ANY string derived from w using G’s productions
PROPERTIES OF GRAMMARS -5 • If X is a non-terminal of BNF grammar G then Follow(X) = set of TERMINAL symbols that can follow X in a derivation from G’s special symbol using G’s productions. Nullable(X) is true IFF the empty word can be derived from X
PARSING • A PARSER (or ‘PARSING ALGORITHM’) derives SYNTAX/PARSE TREES from a sequence of TOKENS • Some well-known tools are little more than PARSERS e.g. Syntax Directed Editors - those that colour different syntax classes or point out syntax errors as you make them • PARSING ALGORITHMS invariably are based on the CONTEXT FREE GRAMMAR that defines the language being parsed.
Example • Input String: b = 5; a = 5 * b; PRINT(b * a) • Ouput PARSE TREE (as a Java data structure) new CompoundStm( new AssignStm("b",new NumExp(5)), new CompoundStm( new AssignStm("a", new OpExp(new NumExp(5), OpExp.Times, new IdExp("b"))), new PrintStm(new LastExpList(new OpExp(new IdExp("b"), OpExp.Times, new IdExp("a")))) ) );
PARSING ALGORITHMS Parsers that follow the language’s grammar can be TOP DOWN: - starting with the grammar’s special symbol, try to find a derivation of the string of tokens being parsed, consuming one token at a time BOTTOM UP: - start with the string viewed as a stack. Match the top ‘n’ tokens of the stack with the RHS of a grammar rule P and replace those tokens with the LHS of P
PRACTICAL EXAMPLE OF THEORY An grammar G is LL(1) if and only if (i) G is NOT ambiguous (ii) G is NOT left recursive (iii) for EVERY two of G’s productions of the form X ::= W1, X ::= W2, it is the case that First(W1) and First(W2) have no common element LL(1) grammars are nice grammars, they can be translated into a very efficient LL(1) PARSING TABLE.
Practical: JavaCup • JavaCup is a tool that inputs a GRAMMAR in BNF form (+ other stuff..) and outputs a PARSE TREE (in the form of Java Constructors). • Parsers created using JavaCup accept a sequence of TOKENS as input from a SCANNER such as last week’s. • An easy way to show JavaCup’s use is to implement a simple interpreter with it: (i) JavaCup inputs a grammar and creates a corresponding parser P in java. (ii) P accepts tokens from the scanned input and generates and passes the parse tree to some Java code which then EVALUATES it.
SUMMARY • Parsers check strings are legal according to grammatical definitions, and build up a structure representing legal strings (parse trees) • Parsers can sometimes be auto-generated from the defining grammar • Theory of grammars helps us in the auto-generation of parsers