1 / 35

Programming Language Concepts (CIS 635)

Programming Language Concepts (CIS 635). Elsa L Gunter 4303 GITC NJIT, www.cs.njit.edu/~elsa/635. Sample Grammar. <expr> ::= <term> | <term> + <expr> | <term> - <expr> <term> ::= <factor> | <factor> * <term> | <factor> / <term> <factor> ::= <id> | ( <expr> ).

olive
Download Presentation

Programming Language Concepts (CIS 635)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming Language Concepts (CIS 635) Elsa L Gunter 4303 GITC NJIT, www.cs.njit.edu/~elsa/635

  2. Sample Grammar <expr> ::= <term> | <term> + <expr> | <term> - <expr> <term> ::= <factor> | <factor> * <term> | <factor> / <term> <factor> ::= <id> | ( <expr> )

  3. Tokens as SML Datatypes • + - * / ( ) <id> • Becomes an SML datatype datatype token = Id_token of string | Left_parenthesis | Right_parenthesis | Times_token | Divide_token | Plus_token | Minus_token

  4. Parsing Token Streams • We will create three mutually recursive parsing functions: expr : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option) term : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option) factor : (token option * (unit -> token option) -> (bool * (token option * (unit -> token option)

  5. Parsing an Expression <expr> ::= <term> [( + | - ) <expr> ] fun expr tokens = (case term tokens of ( true , tokens_after_term) => (case tokens_after_term of ( SOME Plus_token, tokens_after_plus) =>

  6. Parsing a Plus Expression <expr> ::= <term> + <expr> fun expr tokens = (case term tokens of ( true , tokens_after_term) => (case tokens_after_term of ( SOME Plus_token , tokens_after_plus) =>

  7. Parsing a Plus Expression <expr> ::= <term> + <expr> (case expr (tokens_after_plus(), tokens_after_plus) of ( true , tokens_after_expr) => ( true , tokens_after_expr)

  8. Parsing a Plus Expression <expr> ::= <term> + <expr> (case expr (tokens_after_plus(), tokens_after_plus) of ( true, tokens_after_expr) => ( true , tokens_after_expr)

  9. Parsing a Plus Expression <expr> ::= <term> + <expr> (case expr (tokens_after_plus(), tokens_after_plus) of ( true , tokens_after_expr) => ( true , tokens_after_expr)

  10. What If No Expression After Plus <expr> ::= <term> + <expr> | ( false ,rem_tokens) => ( false , rem_tokens)) • Code for Minus_token is almost identical

  11. What If No Plus or Minus <expr> ::= <term> | _ => ( true , tokens_after_term))

  12. What if No Term expr> ::= <term> [( + | - ) <expr> ] | ( false , rem_tokens) => ( false , rem_tokens)) • Code for term is same as for expr except for replacing addition with multiplication and subtraction with division

  13. Parsing Factor as Id <factor> ::= <id> and factor (SOME (Id_token id_name) , tokens) = ( true , (tokens(), tokens))

  14. Parsing Factor as Parenthesized Expression <factor> ::= ( <expr> ) | factor (SOME Left_parenthesis , tokens) = (case expr (tokens(), tokens) of ( true , tokens_after_expr) =>

  15. Parsing Factor as Parenthesized Expression <factor> ::= ( <expr> ) (case tokens_after_expr of ( SOME Right_parenthesis , tokens_after_rparen ) => ( true , (tokens_after_rparen(), tokens_after_rparen))

  16. What if No Right Parenthesis <factor> ::= ( <expr> ) | _ => ( false , tokens_after_expr))

  17. What If No Expression After Left Parenthesis <factor> ::= ( <expr> ) | ( false , rem_tokens) => ( false , rem_tokens))

  18. What If No Id or Left Parenthesis <factor> ::= <id> | ( <expr> ) | factor tokens = ( false , tokens)

  19. Parsing - in C • Assume global variable currentToken that holds the latest token removed from token stream • Assume subroutine lex( ) to analyze the character stream, find the next token at the head of that stream and update currentToken with that token • Assume subroutine error( ) to raise an exception

  20. Parsing expr – in C <expr> ::= <term> [( + | - ) <expr> ] void expr ( ) { term ( ); if (nextToken == PLUS_CODE) { lex ( ); expr ( ); } else if (nextToken == MINUS_CODE) { lex ( ); expr ( );}

  21. SML Code fun expr tokens = (case term tokens of ( true , tokens_after_term) => (case tokens_after_term of (SOME Plus_token,tokens_after_plus) => (case expr (tokens_after_plus(), tokens_after_plus) of ( true , tokens_after_expr) => ( true , tokens_after_expr)

  22. Parsing expr – in C (optimized) <expr> ::= <term> [( + | - ) <expr> ] void expr ( ) { term( ); while (nextToken == PLUS_CODE || nextToken == MINUS_CODE) { lex ( ); term ( ); } }

  23. Parsing factor – in C <factor> ::= <id> void factor ( ) { if (nextToken = ID_CODE) lex ( );

  24. Parsing Factor as Id <factor> ::= <id> and factor (SOME (Id_token id_name) , tokens) = ( true , (tokens(), tokens))

  25. Parsing factor – in C <factor> ::= ( <expr> ) else if (nextToken == LEFT_PAREN_CODE) { lex ( ); expr ( ); if (nextToken == RIGHT_PAREN_CODE) lex;

  26. Comparable SML Code | factor (SOME Left_parenthesis , tokens) = (case expr (tokens(), tokens) of ( true , tokens_after_expr) => (case tokens_after_expr of ( SOME Right_parenthesis , tokens_after_rparen ) => ( true , (tokens_after_rparen(), tokens_after_rparen))

  27. Parsing factor – in C else error ( ); /* Right parenthesis missing */ } else error ( ); /* Neither <id> nor ( was found at start */ }

  28. Error cases in SML (* No right parenthesis *) | _ => ( false , tokens_after_expr)) (* No expression found *) | ( false , rem_tokens) => ( false , rem_tokens)) (* Neither <id> nor left parenthesis found *) | factor tokens = ( false , tokens)

  29. Lexers – Simple Parsers • Lexers are parsers driven by regular grammars • Use character codes and arithmetic comparisons rather than case analysis to determine syntactic category for each character • Often some semantic action must be taken • Compute a number or build a string and record it in a symbol table

  30. Example • <pos> = <digit> <pos> | <digit> • <digit> = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 fun digit c = (case Char.ord c of n => if n >= Char.ord #”0” andalso n <= Char.ord #”9” then SOME (n – Char.ord #”0”) else NONE)

  31. Example fun pos [] = (NONE,[]) | pos (chars as ch::rem_chars) = (case digit ch of NONE => (NONE, chars) | SOME n => (case pos rem_chars of (NONE, more_chars) => (SOME (10,n), more_chars) | (SOME (p,m), more_chars) => (SOME (10*p,(p*n)+m), more_chars)))

  32. Problems for Recursive-Descent Parsing • Left Recursion: A ::= Aw translates to a subroutine that loops forever • Indirect Left Recursion: A ::= Bw B ::= Av causes the same problem

  33. Problems for Recursive-Descent Parsing • Parser must always be able to choose the next action based only only the next very next token • Pairwise disjointedness Test: Can we always determine which rule (in the non-extended BNF) to choose based on just the first token

  34. Pairwise Disjointedness Test • For each rule A ::= y Calculate FIRST (y) = {a | y =>* aw}  { | if y =>* } • For each pair of rules A ::= y and A ::= z, require FIRST(y)  FIRST(z) = { } • Test too strong: Can’t handle <expr> ::= <term> [ ( + | - ) <expr> ]

  35. Example Grammar: <S> ::= <A> a <B> b <A> ::= <A> b | b <B> ::= a <B> | a FIRST (<A> b) = {b} FIRST (b) = {b} Rules for <A> not pairwise disjoint

More Related