1 / 44

The Recursive Descent Algorithm

The Recursive Descent Algorithm. A useful predictive parser for many applications. Under Construction (Nov 16). The Recursive Descent Algorithm. The recursive descent algorithm directly implements a grammar written as EBNF rules. The rules should not contain left recursion

clementn
Download Presentation

The Recursive Descent Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Recursive Descent Algorithm A useful predictive parser for many applications. Under Construction (Nov 16)

  2. The Recursive Descent Algorithm • The recursive descent algorithm directly implements a grammar written as EBNF rules. • The rules should not contain left recursion • There is one function (method) for each EBNF rule. • Each method parses the input corresponding to its EBNF rule, and returns a value. The value may be: • a node on the abstract syntax tree of the input • value computed by evaluating the input (e.g. a calculator) • Recursive descent is a predictive parser. • Limited look-ahead ("peek" at the next token) can be incorporated.

  3. Recursive-descent intro (0) Grammar:expr=>expr+term | expr-term | termterm=>termfactor | factorfactor=>'('expr')' | number

  4. Recursive-descent intro (0.5) Grammar in EBNF (no "self-recursion"): expr=>term {(+ | -)term } term=>factor{factor} factor=>'('expr')' | number

  5. Recursive-descent intro (1) • Grammar:expr=>term{ +term }term=>factor{ factor}factor=>'('expr')' | number • Generic C code for concept only (don't use this): expr() { term(); while(token=='+') { match('+'); term(); } } term() { factor(); while(token=='*') { match('*'); factor(); } }

  6. Recursive-descent intro (2) • Grammar:expr=>term{ +term }term=>factor{ factor}factor=>'('expr')'| number • Factor and number: factor() { if (token == ‘(‘) { match('('); expr( ); match(‘)’); } else number( ); } number() { if ( isNumber(token) ) { add_to_parse_tree(); nextToken( ); } else error("invalid number"); }

  7. Recursive-descent intro (3) • match(value) is a utility that requires a match: if current token matches the argument, consume the token and get next token. • Otherwise print an error. ... and then what? void match(char what) { if ( *token == what ) { nextToken( ); } else { /* 'printf' style error function */ error("expected %c got %s", what, token); } }

  8. Where's the token? • In this algorithm, token is a global variable that always contains the next unread token. • nextToken() returns true if there are more tokens, and also sets the token variable. boolean nextToken( ) { token = scanner( ); return ( token != EOF ); } • Another utility function is match(value): 1) if value matches token, get a new token 2) if value doesn't match, raise an error condition.

  9. Where's the output? • In the generic algorithm, the result is a global variable. • The methods must either return a value or accumulate value as a side effect. • Rules which have terminal values should return the terminal value. • factor=>(expr ) | number number() { if ( isNumber(token) ) { // add token to the parse tree // or return a value } else error("invalid number"); }

  10. Recursive Descent Example (1) Let's look at a recursive descent code for a calculator. We will modify the generic algorithm so that each function returns a double value. input: expr '\n' expr: term { (+|-) term } term: factor { (*|/) factor} factor: '(' expr ')' | number

  11. Recursive Descent Example (1) Let's look at a recursive descent code for a calculator. We will modify the generic algorithm so that each function returns a double value. Example: here is a modified expr( ) function double expr() { double expr = term(); while( token =='+' || token =='-') ) { if (token == '+') { match('+'); expr = expr+ term(); } else { match('-'); expr= expr- term(); } } return expr; } Grammar Rule: expr: term { (+|-) term }

  12. Recursive Descent Example (2) The rule for factor is more interesting: we must check the first token to decide which alternative to use, then double factor() { double fact; if ( token == '(' ) { nextToken( ); fact = expr( ); match( ')' ); return fact; } else { fact = number( ); return fact; } } Grammar Rule: factor: '(' expr ')' | number

  13. Recursive Descent Example (3) Input:2 * 3 + ( 4 - 5 ) / 6 Progress: token = "2" token = nextToken(); ans = expr( ); input line

  14. Recursive Descent Example (4) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "2" input line ans = expr( ); expr( ) { expr = term( ); while ( token=='+'|| token='-') { if ( token=='+' ) { match('+'); expr = expr + term( ); } else { match('-'); expr = expr - term( ); } return expr; expr expr = term( );

  15. Recursive Descent Example (5) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "2" input line ans = expr( ); expr expr = term( ); term term = factor( ); term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

  16. Recursive Descent Example (6) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "*" input line ans = expr( ); expr expr = term( ); term term = factor( ); factor( ) { if ( token=='(' ) { match('('); fact = expr( ); match(')'); } else { fact = number( ); } factor fact = number( ); /* token = '*' */ return fact

  17. Recursive Descent Example (7) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "3" input line ans = expr( ); expr expr = term( ); term term = 2; term = term * factor( ); term( ) { term = factor( ); while ( token=='*' || token='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

  18. Recursive Descent Example (8) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "+" input line ans = expr( ); expr expr = term( ); term term = term * factor( ); factor( ) { if ( token=='(' ) { match('('); fact = expr( ); match(')'); } else { fact = number( ); } factor fact = number( ); /* token = '*' */ return fact

  19. Recursive Descent Example (9) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "+" input line ans = expr( ); expr expr = term( ); term term = term * 3; return term term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

  20. Recursive Descent Example (10) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "(" input line ans = expr( ); expr( ) { expr = term( ); while( token=='+'|| token=='-') { if ( token=='+' ) { match('+'); expr = expr + term( ); } else { match('-'); expr = expr - term( ); } return expr; expr expr = 6;token = '+'match('+')expr = expr + term( )

  21. Recursive Descent Example (11) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "(" input line ans = expr( ); expr expr = term( ); term term = factor( ); term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

  22. Recursive Descent Example (12) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "4" input line ans = expr( ); expr expr = term( ); term term = term * factor( ); factor( ) { if ( token=='(' ) { match('('); fact = expr( ); match(')'); } else { fact = number( ); } return fact; factor match('(') fact = expr( );

  23. Recursive Descent Example (13) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "4" input line ans = expr( ); expr expr = term( ); expr( ) { expr = term( ); while (token=='+'|| token=='-') { if ( token=='+' ) { match('+'); expr = expr + term( ); } else { match('-'); expr = expr - term( ); } return expr; term term = term * factor( ); factor fact = expr( ); expr expr = term( );

  24. Recursive Descent Example (14) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "-" input line ans = expr( ); expr expr = term( ); term term = term * factor( ); factor fact = expr( ); expr expr = term( ); term term = factor( ); factor fact = number( ); /* = 4, token = "-" */

  25. Recursive Descent Example (15) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "-", then token = "5" input line ans = expr( ); expr expr = term( ); expr( ) { expr = term( ); while (token=='+'|| token=='-') { if ( token=='+' ) { match('+'); expr = expr + term( ); } else { match('-'); expr = expr - term( ); } return expr; term term = term * factor( ); factor fact = expr( ); expr match('-')expr = expr - term( );

  26. Recursive Descent Example (16) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "5" input line ans = expr( ); expr expr = term( ); term term = term * factor( ); factor fact = expr( ); expr expr = expr - term( ); term term = factor( ); factor fact = number( ); /* = 5 . token = ")" */

  27. Recursive Descent Example (16) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "/" input line ans = expr( ); expr expr = term( ); term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token = '*' ) { ... } term term = factor( ); factor fact = expr( ); match(')'); return fact; expr expr = 4 - 5; return expr term term = 5; return term; factor return 5

  28. Recursive Descent Example (17) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "6" input line ans = expr( ); expr expr = term( ); term term = -1; match('/') term = term / factor( ); term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

  29. Imperative Approach to Parsing • In the generic algorithm, the token is a global variable, and the results of the parse are a side effect (a change to global variables or structures) • bison and flex operate this way, too. • Programs difficult to understand and maintain. • No error recovery in generic algorithm. /* yylex uses global variables / constants. */ int yylex( ) { ... if ( isdigit(c) ) { ungetc(c, stdin); scanf("%lf", &yylval); return INT; } }

  30. O-O Approach to Parsing • In O-O approach, we can return an object to allow a scanner and parser without global variables. • First, let's look at the overall design. <<interface>>Iterator <<enum>> TokenType refex : Patterm IDENTIFIER OPERATOR NUMBER hasNext() next() Parser Scanner parseTree: TreeSet token: Token scanner: Iterator instream: InputStream token: Token hasNext( ) : boolean expression( ) : Node Token next( ) : Token term( ) : Node type value factor( ) : Node match( String ) : boolean

  31. O-O Scanner • The Scanner should provide two services: test for more tokens and return the next token. • In this view, a Scanner looks like an Iterator<Token>. • A "token" has both a type and a value. /** Token class */ public class Token { Type type; /* consider an enumeration */ public Object value; /* can be anything */ public Token(Type type, Object value) {...} public Object getValue( ) { ... } }

  32. O-O Parser • The Parser implements the parsing algorithm. • Result is either a parse tree or a value (calculator application). • Use an attribute to represent next token. /** Parser class */ public class Parser { Iterator<Token> scanner; private Token token; private TreeNode result; /* parse tree */ TreeNode expression( ) { ... }; TreeNode term( ) { ... }; TreeNode factor( ) { ... }; boolean match( String what ) { ... }; boolean match( Type what ) { ... }; }

  33. O-O Parser for Calculator • For a calculator, the parser can compute result. • Can use a primitive data type for expression, etc. /** Parser class */ public class Parser { Iterator<Token> scanner; private Token token; private double result; double expression( ) { ... }; double term( ) { ... }; double factor( ) { ... }; boolean match( String what ) {...}; }

  34. Observation: match • If the generic algorithm, the token is almost always tested before calling match. • Eliminate redundancy by redefining match(value) to return a boolean value if token matches. • if match, then consume the token. private boolean match( String what ) { if ( ! (token.value instanceof String) ) return false; if ( what.equals( (String)(token.value) ) ) { token = scanner.next( ); return true; } return false; }

  35. O-O Parser for Calculator (2) • Example method: expression • EBNF: expr ::= term { (+ | -) term } private double expression( ) { double result = term( ); while( true ) { if ( match("+") ) result += term( ); else if ( match("-") ) result -= term( ); else break; /* why not error( )? */ } return result; }

  36. O-O Parser : Top-Level • What is the top-level routine of the parser? • Look at standard bison code for inspiration: %% /* Bison grammar rules */ input : /* empty input */ | input line ; line : expr '\n' { output( $1 ); } ;

  37. Parsing Errors • How are you going to handle parsing errors? • You might have many levels of function calls... input line result = expr( ); expr expr = term( ) { +|- term( ) }; term term = factor( ) { *|/ factor( ) }; factor factor = '(' expr() ')' | number() ...; Using recursive-decent, parse errors are usually detected at the bottom of the tree: in factor, number, etc. expr term factor Parse error found here

  38. Parsing Errors • If you set an error flag or return an error result, then all the methods must check for this condition... input line if ( error ) print "parse error"; expr if ( error ) return /* what value? */; This error checking will make your methods longer and harder to understand. term if ( error ) return /* what value? */; factor if ( error ) return /* what value? */; expr if ( error ) return /* what value? */; term if ( error ) return /* what value? */; factor Parse error found here

  39. Throwing an Exception • Your code will be simpler if the methods simply throw an exception, and let the top-most method catch it. input line try { result = expr( ); } catch (ParseException e) {/*error*/} expr expr( ) throws ParseException { ... } term term( ) throws ParseException { ... } Let someone else handle it! factor factor( ) throws ParseException { ... } expr expr( ) throws ParseException { ... } term term( ) throws ParseException { ... } factor throw new ParseException( )

  40. Using Java's ParseException • Java has a ParseException class you can use:java.text.ParseException • the constructor requires two parameters: new ParseException("error message", offset); • Example: number( ) { /* parse a number */ whitespace(); token = tokenizer.next(); if ( token.type != TokenType.NUMBER ) throw new ParseException( "invalid number", cptr);

  41. Defining your own ParseException • You can define a new Exception type for your own use import java.io.IOException; class ParseException extends IOException { /* constructors */ ParseException() { super("Parse Error"); } ParseException(String msg) { super(msg); } ParseException(String msg, int column) { super(msg + " in column " + column); } }

  42. Using ParseException • You should try to return useful error messages, such as... factor( ) { if ( match('(') ) { result = expr( ); if ( ! match(')') ) throw new ParseException("missing right parenthesis"); } • The getMessage( ) method returns the error message... try { result = expr( ); } catch(ParseException e) { println( e.getMessage() ); } • Including the column number in error messages can be helpful.

  43. Parsing Unary Minus Sign • Parsing negative numbers and unary minus can also be tricky. The following are valid expressions in most languages: sum = sum + -1; sum = sum - -2; sum = sum * -x; • The GNU C compiler (gcc) allows a space after the unary "-" : sum = sum - - 2; • Exponentiation has higher precedence than unary minus, so it should be incorporated in a rule at the bottom of your grammar rules: -2 ^ 3 means - (2^3)

  44. What's Next? Later we will add to the implementation... • symbol table and assignments x = 3.5E7 a = 5 b = 0.1 y = ( a*x + b ) / ( a*x - b ) • built-in functions y = sqrt( x ) • user defined functions function f(x) = a*x + b f(0.5)

More Related