Languages and Compilers (SProg og Oversættere)

Languages and Compilers(SProg og Oversættere) Parsing

Parsing • Describe the purpose of the parser • Discuss top down vs. bottom up parsing • Explain necessary conditions for construction of recursive decent parsers • Discuss the construction of an RD parser from a grammar

Top-Down vs Bottom-Up parsing LR-Analyse (Bottom-Up) LL-Analyse (Top-Down) Reduction Derivation Look-Ahead Look-Ahead

Development of Recursive Descent Parser (1) Express grammar in EBNF (2) Grammar Transformations: Left factorization and Left recursion elimination (3) Create a parser class with • private variable currentToken • methods to call the scanner: accept and acceptIt (4) Implement private parsing methods: • add private parseNmethod for each non terminal N • public parsemethod that • gets the first token form the scanner • calls parseS (S is the start symbol of the grammar)

Recursive Descent Parsing public class MicroEnglishParser { private TerminalSymbol currentTerminal; //Auxiliary methods will go here ... //Parsing methods will go here ... }

Recursive Descent Parsing: Auxiliary Methods public class MicroEnglishParser { private TerminalSymbol currentTerminal private void accept(TerminalSymbol expected) { if (currentTerminal matchesexpected) currentTerminal = next input terminal; else report a syntax error } ... }

Recursive Descent Parsing: Parsing Methods Sentence ::= Subject Verb Object . private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’); }

Recursive Descent Parsing: Parsing Methods Subject ::= I | aNoun | theNoun private void parseSubject() { if (currentTerminal matches‘I’) accept(‘I’); else if (currentTerminal matches‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches‘the’) { accept(‘the’); parseNoun(); } else report a syntax error }

Recursive Descent Parsing: Parsing Methods Noun ::= cat | mat| rat private void parseNoun() { if (currentTerminal matches‘cat’) accept(‘cat’); else if (currentTerminal matches‘mat’) accept(‘mat’); else if (currentTerminal matches‘rat’) accept(‘rat’); else report a syntax error }

LL 1 Grammars • The presented algorithm to convert EBNF into a parser does not work for all possible grammars. • It only works for so called “LL 1” grammars. • Basically, an LL1 grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. • What grammars are LL1? How can we recognize that a grammar is (or is not) LL1? • We can deduce the necessary conditions from the parser generation algorithm. • We can use a formal definition

LL 1 Grammars parseX* while (currentToken.kind is in starters[X]) { parseX } Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X * parseX|Y switch (currentToken.kind) { cases instarters[X]: parseX break; cases instarters[Y]: parseY break; default: report syntax error } Condition: starters[X] and starters[Y] must be disjoint sets.

Formal definition of LL(1) • A grammar G is LL(1) iff • for each set of productions M ::= X1| X2 | … | Xn : • starters[X1], starters[X2], …, starters[Xn] are all pairwise disjoint • If Xi =>* ε then starters[Xj]∩ follow[X]=Ø, for 1≤j≤ n.i≠j • If G is ε-free then 1 is sufficient

Converting EBNF into RD parsers • The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated! • => JavaCC “Java Compiler Compiler”

JavaCC and JJTree

LR parsing • The algorithm makes use of a stack. • The first item on the stack is the initial state of a DFA • A state of the automaton is a set of LR0/LR1 items. • The initial state is constructed from productions of the form S:= •a [, $] (where S is the start symbol of the CFG) • The stack contains (in alternating) order: • A DFA state • A terminal symbol or part (subtree) of the parse tree being constructed • The items on the stack are related by transitions of the DFA • There are two basic actions in the algorithm: • shift: get next input token • reduce: build a new node (remove children from stack)

JavaCUP: A LALR generator for Java Definition of tokens Regular Expressions Grammar BNF-like Specification JFlex JavaCUP Java File: Scanner Class Recognizes Tokens Java File: Parser Class Uses Scanner to get TokensParses Stream of Tokens Syntactic Analyzer

Steps to build a compiler with SableCC • Create a SableCC specification file • Call SableCC • Create one or more working classes, possibly inherited from classes generated by SableCC • Create a Main class activating lexer, parser and working classes • Compile with Javac

Hierarchy

Languages and Compilers (SProg og Oversættere)