1 / 80

CS 363 Comparative Programming Languages

CS 363 Comparative Programming Languages. Lecture 3: Syntax & Notation. Topics. The General Problem of Describing Syntax Formal Methods of Describing Syntax Context Free Grammars, BNF Parse Trees. Introduction. Who must use language definition Language designers Implementors

misty
Download Presentation

CS 363 Comparative Programming Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 363 Comparative Programming Languages Lecture 3: Syntax & Notation

  2. Topics • The General Problem of Describing Syntax • Formal Methods of Describing Syntax • Context Free Grammars, BNF • Parse Trees CS 363 Spring 2005 GMU

  3. Introduction • Who must use language definition • Language designers • Implementors • Programmers (the users of the language) • Syntax - the form or structure of the expressions, statements, and program units • Semantics - the meaning of the expressions, statements, and program units Our focus today CS 363 Spring 2005 GMU

  4. What is a language? • Alphabet (S) – finite set of basic syntatic elements (characters, tokens) • The S of C++ includes {while, for, +, identifiers, integers, …} • Sentence – finite sequence of elements in S – can be l, the empty string (Some texts use e as the empty string) • A legal C++ program is a single sentence in that language • Language – possibly infinite set of sentences over some alphabet – can be { }, the empty language. • Set of all legal C++ programs defines the language CS 363 Spring 2005 GMU

  5. Suppose S = {a,b,c}. Some languages over S could be: • {aa,ab,ac,bb,bc,cc} • {ab,abc,abcc,abccc,. . .} • { l }, where l (e) is the empty string (length = 0) • { } • {a,b,c,l} CS 363 Spring 2005 GMU

  6. Recognizing Languages • Typically the task of a compiler • Find tokens (S) from the input • See if tokens in appropriate order • Determine what that token ordering means • All of this must be formally specified CS 363 Spring 2005 GMU

  7. A Typical Compiler Architecture Syntactic/semantic structure tokens Syntactic structure Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Source language Code Optimizer Symbol Table CS 363 Spring 2005 GMU

  8. Token • lexeme – indivisible string in an input language: ex: while, (, main, … • token – (possibly infinite) set of lexemes defining an atomic element with a defined meaning while_token = {“while”} identifier_token = {“main”, “x”, … } Tokens are often describable using a pattern. • The language of tokens is regular. CS 363 Spring 2005 GMU

  9. Lexical Analysis • Break input string of characters into tokens. while (a < limit) { a=a + 1; } while (a < limit) { a=a + 1; } • Remove white space, comments CS 363 Spring 2005 GMU

  10. Describing Language Syntax • Enumeration – what are all the possible legal token orderings • Formal approaches to describing syntax: • Recognizers - used in compilers– “Is the given sentence in the language?” • Generators – generate the sentences of a language CS 363 Spring 2005 GMU

  11. Metalanguages for Describing Syntax • A metalanguage is a language used to describe another language. • Abstractions are used to represent classes of syntactic structures--they act like syntactic variables (also called nonterminal symbols) • Define a class of languages called context-free languages • Context-Free Grammars (Noam Chomsky in mid 1950’s) • Backus-Naur Form or BNF (1959 invented by John Backus to describe Algol 58) CS 363 Spring 2005 GMU

  12. Backus-Naur Form (BNF) <while_stmt> while (<logic_expr>)<stmt> • This is a ruledescribingthe structure of a while statement • Non-terminals are placeholders for other rules: <while_stmt>, <logic_expr>, <stmt> • Tokens (terminal symbols) are part of the language alpahbet CS 363 Spring 2005 GMU

  13. BNF Examples • Vt = {+,-,0..9}, Vn = {<L>,<D>}, s = {<L>} <L>  <L> + <D> | <L> – <D> | <D> <D>  0 | … | 9 • Vt={(,)}, Vn = {<L>}, s = {<L>} <L>  ( <L> ) <L> <L>  l recursion CS 363 Spring 2005 GMU

  14. BNF Examples • Vt = {a,b,c,d,;,=,+,-,const}, Vn = {<program>, <stmts>, <stmt>, <var>, <expr>, <term> } <program>  <stmts> <stmts>  <stmt> | <stmt> ; <stmts> <stmt>  <var> = <expr> <var>  a | b | c | d <expr>  <term> + <term> | <term> - <term> <term>  <var> | const CS 363 Spring 2005 GMU

  15. Applying BNF rules • Definition: Given a string a A b and a production A  g, we can replaceA with g: a A b a gb is a single step derivation. (a, b, and g are strings of zero or more terminals/non-terminals) Examples: <L> + <D> <L> – <D> + <D> using <L>  <L> - <D> ( <L> ) ( <L> )  ( ( <L> ) <L> ) ( <L> ) using <L>  ( <L> ) <L> CS 363 Spring 2005 GMU

  16. Derivations Definition: A sequence of rule applications: w0w1…  wn is a derivationof wn from w0(w0* wn) <L> production <L>  ( <L> ) <L> ( <L> ) <L> production <L>  l • ( ) <L> production <L>  l  ( ) <L> * () If wi has non-terminal symbols, it is referred to as sentential form. CS 363 Spring 2005 GMU

  17. Derivation • A sentence is a sentential form that has only terminal symbols • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded • A derivation may be neither leftmost nor rightmost CS 363 Spring 2005 GMU

  18. Derivation of (())() Grammar: <L>  (<L>)<L> <L>  l <L>  * (( )) ( ) CS 363 Spring 2005 GMU

  19. Same String, Leftmost Derivation Grammar: <L>  (<L>)<L> <L>  l <L>  * (( )) ( ) CS 363 Spring 2005 GMU

  20. Same String, Rightmost Derivation Grammar: <L>  (<L>)<L> <L>  l <L>  * (( )) ( ) CS 363 Spring 2005 GMU

  21. L(G), the language generated by grammar G is {w in Vt*: s * w for start symbol s} • Both () and (())() are in L(G) for the previous grammar. CS 363 Spring 2005 GMU

  22. Parse Trees The parse tree for some string in some language is defined by the grammar G as follows: • The root is the start symbol of G • The leaves are terminals or l. When visited from left to right, the leaves form the input string • The interior nodes are non-terminals of G • For every non-terminal A in the tree with children B1 … Bk, there is some production A  B1… Bk If a string is in the given language, a parse tree must exist. CS 363 Spring 2005 GMU

  23. Parse Tree for (())() L ( L ) L ( L ) L ( L ) L l l l l CS 363 Spring 2005 GMU

  24. Ambiguity A grammar is ambiguous if there at least two parse trees (or leftmost derivations ) for some string in the language E  E + E E  E * E E  0 | … | 9 E E E * E E + E 4 2 E * E E + E 3 4 2 3 2 + 3 * 4 CS 363 Spring 2005 GMU

  25. An UnambiguousExpression Grammar • Grammars can be written that enforce precedence: <expr>  <expr> + <term> | <term> <term>  <term> * <c> | <c> <C>  0 | 1 | … | 9 <expr> <expr> + <term> <c> <term> <term> * 2 + 3 * 4 4 <c> <c> 3 2 CS 363 Spring 2005 GMU

  26. Formal Methods of Describing Syntax • Operator associativity can also be indicated by a grammar <expr> -> <expr> + <expr> | const(ambiguous) <expr> -> <expr> + const | const(unambiguous) <expr> <expr> <expr> + const <expr> + const const CS 363 Spring 2005 GMU

  27. EBNF • Extended BNF: • Shorthand for BNF • Optional parts are placed in brackets ([ ]) <proc_call> -> ident [ ( <expr_list>)] • Put alternative parts of RHSs in parentheses and separate them with vertical bars <term> -> <term> (+ | -) const • Put repetitions (0 or more) in braces ({ }) <ident> -> letter {letter | digit} CS 363 Spring 2005 GMU

  28. BNF and EBNF • BNF: <expr>  <expr> + <term> | <expr> - <term> | <term> <term>  <term> * <factor> | <term> / <factor> | <factor> • EBNF: <expr>  <term> {(+ | -) <term>} <term>  <factor> {(* | /) <factor>} CS 363 Spring 2005 GMU

  29. Lexical and Syntax Analysis • If a string is in a language, a parse tree can be derived for that string • Problem: We need to go from a string of characters (input file) to a legal parse tree to show that a string is in the language. • From introduction: compilers, interpreters, hybrid approaches • Our Focus: Top-Down Parsing CS 363 Spring 2005 GMU

  30. Parsing • Take sequence of tokens and produce a parse tree • Two general algorithms (methods): top-down, bottom-up • Algorithms derived from the cfg • Note: We can’t always derive an algorithm from a cfg CS 363 Spring 2005 GMU

  31. Top Down Start symbol L String: (())() CS 363 Spring 2005 GMU

  32. Top Down L ( L ) L String: (())() CS 363 Spring 2005 GMU

  33. Top Down L ( L ) L ( L ) L String: (())() CS 363 Spring 2005 GMU

  34. Top Down L ( L ) L ( L ) L l String: (())() CS 363 Spring 2005 GMU

  35. Top Down L ( L ) L ( L ) L l l String: (())() CS 363 Spring 2005 GMU

  36. Top Down L ( L ) L ( L ) L ( L ) L l l String: (())() CS 363 Spring 2005 GMU

  37. Top Down L ( L ) L ( L ) L ( L ) L l l l String: (())() CS 363 Spring 2005 GMU

  38. Top Down L ( L ) L ( L ) L ( L ) L l l l l String: (())() CS 363 Spring 2005 GMU

  39. Writing a recursive descent parser • Procedure for each non-terminal. Use next token (lookahead) to choose which production for that nonterminal to ‘mimic’: • for non-terminal X, call procedure X() • for terminals X, call ‘match(X)’ • match(symbol) { if (symbol == lookahead) lookahead = next_token() else error() } • Function next_token() gets the next token from the lexical analyzer – must be called before the first call to get first lookahead. CS 363 Spring 2005 GMU

  40. Simplified RDP Example L  ( L ) L | l L() { if (lookahead == ‘(‘) { /* L  ( L ) L */ match(‘(‘); L(); match(‘)’); L(); } else return; /* L  l */ } main() { lookahead = next_token(); L(); } CS 363 Spring 2005 GMU

  41. Tracing the Recursive Descent Parse call L() L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

  42. Tracing the Recursive Descent Parse call L() call L() L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

  43. Tracing the Recursive Descent Parse call L() call L() call L() L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

  44. Tracing the Recursive Descent Parse call L() call L() call L() - return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

  45. Tracing the Recursive Descent Parse call L() call L() call L() - return call L() L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

  46. Tracing the Recursive Descent Parse call L() call L() call L() - return call L() - return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

  47. Tracing the Recursive Descent Parse call L() call L() - return call L() - return call L() – return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

  48. Tracing the Recursive Descent Parse call L() call L() - return call L() - return call L() – return call L() L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

  49. Tracing the Recursive Descent Parse call L() call L() - return call L() - return call L() – return call L() call L() L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

  50. Tracing the Recursive Descent Parse call L() call L() - return call L() - return call L() – return call L() call L() - return L ( L ) L ( L ) L ( L ) L l l l l String: ( ( ) ) ( ) lookahead CS 363 Spring 2005 GMU

More Related