190 likes | 298 Views
Languages and Compilers (SProg og Oversættere). Parsing. Parsing. Describe the purpose of the parser Discuss top down vs. bottom up parsing Explain necessary conditions for construction of recursive decent parsers Discuss the construction of an RD parser from a grammar.
E N D
Parsing • Describe the purpose of the parser • Discuss top down vs. bottom up parsing • Explain necessary conditions for construction of recursive decent parsers • Discuss the construction of an RD parser from a grammar
Top-Down vs Bottom-Up parsing LR-Analyse (Bottom-Up) LL-Analyse (Top-Down) Reduction Derivation Look-Ahead Look-Ahead
Development of Recursive Descent Parser (1) Express grammar in EBNF (2) Grammar Transformations: Left factorization and Left recursion elimination (3) Create a parser class with • private variable currentToken • methods to call the scanner: accept and acceptIt (4) Implement private parsing methods: • add private parseNmethod for each non terminal N • public parsemethod that • gets the first token form the scanner • calls parseS (S is the start symbol of the grammar)
Recursive Descent Parsing Sentence ::= Subject Verb Object . Subject ::= I | aNoun | theNoun Object ::= me | aNoun | the Noun Noun ::= cat | mat| rat Verb ::= like| is | see | sees Define a procedure parseN for each non-terminal N private void parseSentence() ; private void parseSubject(); private void parseObject(); private void parseNoun(); private void parseVerb();
Recursive Descent Parsing public class MicroEnglishParser { private TerminalSymbol currentTerminal; //Auxiliary methods will go here ... //Parsing methods will go here ... }
Recursive Descent Parsing: Auxiliary Methods public class MicroEnglishParser { private TerminalSymbol currentTerminal private void accept(TerminalSymbol expected) { if (currentTerminal matchesexpected) currentTerminal = next input terminal; else report a syntax error } ... }
Recursive Descent Parsing: Parsing Methods Sentence ::= Subject Verb Object . private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’); }
Recursive Descent Parsing: Parsing Methods Subject ::= I | aNoun | theNoun private void parseSubject() { if (currentTerminal matches‘I’) accept(‘I’); else if (currentTerminal matches‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches‘the’) { accept(‘the’); parseNoun(); } else report a syntax error }
Recursive Descent Parsing: Parsing Methods Noun ::= cat | mat| rat private void parseNoun() { if (currentTerminal matches‘cat’) accept(‘cat’); else if (currentTerminal matches‘mat’) accept(‘mat’); else if (currentTerminal matches‘rat’) accept(‘rat’); else report a syntax error }
LL 1 Grammars • The presented algorithm to convert EBNF into a parser does not work for all possible grammars. • It only works for so called “LL 1” grammars. • Basically, an LL1 grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. • What grammars are LL1? How can we recognize that a grammar is (or is not) LL1? • We can deduce the necessary conditions from the parser generation algorithm. • We can use a formal definition
LL 1 Grammars parseX* while (currentToken.kind is in starters[X]) { parseX } Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X * parseX|Y switch (currentToken.kind) { cases instarters[X]: parseX break; cases instarters[Y]: parseY break; default: report syntax error } Condition: starters[X] and starters[Y] must be disjoint sets.
Formal definition of LL(1) • A grammar G is LL(1) iff • for each set of productions M ::= X1| X2 | … | Xn : • starters[X1], starters[X2], …, starters[Xn] are all pairwise disjoint • If Xi =>* ε then starters[Xj]∩ follow[X]=Ø, for 1≤j≤ n.i≠j • If G is ε-free then 1 is sufficient
Converting EBNF into RD parsers • The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated! • => JavaCC “Java Compiler Compiler”
LR parsing • The algorithm makes use of a stack. • The first item on the stack is the initial state of a DFA • A state of the automaton is a set of LR0/LR1 items. • The initial state is constructed from productions of the form S:= •a [, $] (where S is the start symbol of the CFG) • The stack contains (in alternating) order: • A DFA state • A terminal symbol or part (subtree) of the parse tree being constructed • The items on the stack are related by transitions of the DFA • There are two basic actions in the algorithm: • shift: get next input token • reduce: build a new node (remove children from stack)
JavaCUP: A LALR generator for Java Definition of tokens Regular Expressions Grammar BNF-like Specification JFlex JavaCUP Java File: Scanner Class Recognizes Tokens Java File: Parser Class Uses Scanner to get TokensParses Stream of Tokens Syntactic Analyzer
Steps to build a compiler with SableCC • Create a SableCC specification file • Call SableCC • Create one or more working classes, possibly inherited from classes generated by SableCC • Create a Main class activating lexer, parser and working classes • Compile with Javac