1 / 38

Grammars

Grammars. CPSC 5135. Formal Definitions. A symbol is a character. It represents an abstract entity that has no inherent meaning Examples: a, A, 3, *, - ,=. Formal Definitions. An alphabet is a finite set of symbols. Examples: A = { a, b, c } B = { 0, 1 }.

KeelyKia
Download Presentation

Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grammars CPSC 5135

  2. Formal Definitions • A symbol is a character. It represents an abstract entity that has no inherent meaning • Examples: a, A, 3, *, - ,=

  3. Formal Definitions • An alphabet is a finite set of symbols. • Examples: A = { a, b, c } B = { 0, 1 }

  4. Formal Definitions • A string (or word) is a finite sequence of symbols from a given alphabet. • Examples: S = { 0, 1 } is a alphabet 0, 1, 11010, 101, 111 are strings from S A = { a, b, c ,d } is an alphabet bad, cab, dab, d, aaaaa are strings from A

  5. Formal Definitions • A language is a set of strings from an alphabet. • The set can be finite or infinite. • Examples: A = { 0, 1} L1 = { 00, 01, 10, 11 } L2 = { 010, 0110, 01110,011110, …}

  6. Formal Definitions • A grammar is a quadruple G = (V, Σ, R, S) where 1) V is a finite set of variables (non-terminals), 2) Σ is a finite set of terminals, disjoint from V, 3) R is a finite set of rules. The left side of each rule is a string of one or more elements from V U Σ and whose right side is a string of 0 or more elements from V U Σ 4) S is an element of V and is called the start symbol

  7. Formal Definitions • Example grammar: • G = (V, Σ, R, S) V = { S, A } Σ = { a, b } R = { S → aA A → bA A → a }

  8. Derivations R = S → aA A → bA A → a • A derivation is a sequence of replacements , beginning with the start symbol, and replacing a substring matching the left side of a rule with the string on the right side of a rule S → aA →abA → abbA → abba

  9. Derivations • What strings can be generated from the following grammar? S → aBa B → aBa B → b

  10. Formal Definitions • The language generated by a grammar is the set of all strings of terminal symbols which are derivable from S in 0 or more steps. • What is the language generated by this grammar? • S → a S → aB B → aB B → a

  11. Kleene Closure • Let Σ be a set of strings. Σ* is called the Kleene closure of Σ and represents the set of all concatenations of 0 or more strings in Σ. • Examples Σ* = { 1 }* = { ø, 1, 11, 111, 1111, …} Σ* = { 01 }* = { ø, 01, 0101, 010101, …} Σ* = { 0 + 1 }* = set of all possible strings of 0’s and 1’s. (+ means union)

  12. Formal Definitions • A grammar G = (V,Σ, R, S) is right-linear if all rules are of the form: A → xB A → x where A, B ε V and x εΣ*

  13. Right-linear Grammar • G = { V, Σ, R, S } V = { S, B } Σ = { a, b } R = { S → aS , S → B , B → bB , B → ε } What language is generated?

  14. Formal Definitions • A grammar G = (V,Σ, R, S) is left-linear if all rules are of the form: A → Bx A → x where A, B ε V and x εΣ*

  15. Formal Definitions • A regular grammar is one that is either right or left linear. • Let Q be a finite set and let Σ be a finite set of symbols. Also let δ be a function from Q x Σ to Q,   let q0 be a state in Q and let A be a subset of Q. We call each element of Q a state, δ the transition function, q0 the initial state and A the set of accepting states. Then a deterministic finite automaton (DFA) is a 5-tuple < Q , Σ , q0 , δ , A > • Every regular grammar is equivalent to a DFA

  16. Language Definition • Recognition – a machine is constructed that reads a string and pronounces whether the string is in the language or not. (Compiler) • Generation – a device is created to generate strings that belong to the language. (Grammar)

  17. Chomsky Hierarchy • Noam Chomsky (1950’s) described 4 classes of grammars 1) Type 0 – unrestricted grammars 2) Type 1 – Context sensitive grammars 3) Type 2 – Context free grammars 4) Type 3 – Regular grammars

  18. Grammars • Context-free and regular grammars have application in computing • Context-free grammar – each rule or production has a left side consisting of a single non-terminal

  19. Backus-Naur form (BNF) • BNF was used to describe programming language syntax and is similar to Chomsky’s context free grammars • A meta-language is a language used to describe another language • BNF is a meta-language for computer languages

  20. BNF • Consists of nonterminal symbols, terminal symbols (lexemes and tokens), and rules or productions • <if-stmt> → if <logical-expr> then <stmt> • <if-stmt> → if <logical-expr> then <stmt> else <stmt> • <if-stmt> → if <logical-expr> then <stmt> | if <logical-expr> then <stmt> else <stmt>

  21. A Small Grammar <program>  begin <stmt_list> end <stmt_list>  <stmt> | <stmt> ; <stmt_list> <stmt>  <var> = <expression> <var>  A | B | C <expression>  <var> + <var> | <var> - <var> | <var>

  22. A Derivation <program>  begin <stmt_list> end  begin <stmt> end begin <var> = <expression> end begin A = <expression> end begin A = <var> + <var> end begin A = B + <var> end begin A = B + C end

  23. Terms • Each of the strings in a derivation is called a sentential form. • If the leftmost non-terminal is always the one selected for replacement, the derivation is a leftmost derivation. • Derivations can be leftmost, rightmost, or neither • Derivation order has no effect on the language generated by the grammar

  24. Derivations Yield Parse Trees <program>  begin <stmt_list> end  begin <stmt> end begin <var> = <expression> end begin A = <expression> end begin A = <var> + <var> end begin A = B + <var> end begin A = B + C end <Program> begin <stmt_list> end <stmt> <var> = <expression> A <var> + <var> B C

  25. Parse Trees • Parse trees describe the hierarchical structure of the sentences of the language they define. • A grammar that generates a sentence for which there are two or more distinct parse trees is ambiguous.

  26. An Ambiguous Grammar <assign>  <id> = <expr> <id>  A | B | C <expr>  <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <id>

  27. Two Parse Trees – Same Sentence <assign> <id> = <expr> A <expr> + <expr> <id> <expr> * <expr> B <id> <id> C A <assign> <id> = <expr> A <expr> * <expr> <expr> + <expr> <id> <id> <id> A B C

  28. Derivation 1 <assign>  <id> = <expr>  A = <expr>  A = <expr> + <expr>  A = <id> + <expr>  A = B + <expr>  A = B + <expr> * <expr>  A = B + <id> * <expr>  A = B + C * <expr>  A = B + C * <id>  A = B + C * A

  29. Derivation 2 <assign>  <id> = <expr>  A = <expr>  A = <expr> * <expr>  A = <expr> + <expr> * <expr>  A = <id> + <expr> * <expr>  A = B + <expr> * <expr>  A = B + <id> * <expr>  A = B + C * <expr>  A = B + C * <id>  A = B + C * A

  30. Ambiguity • Parse trees are used to determine the semantics of a sentence • Ambiguous grammars lead to semantic ambiguity - this is intolerable in a computer language • Often, ambiguity in a grammar can be removed

  31. Unambiguous Grammar <assign>  <id> = <expr> <id>  A | B | C <expr>  <expr> + <term> | <term> <term>  <term> * <factor> | <factor> <factor>  ( <expr> ) | <id> • This grammar makes multiplication take precedence over addition

  32. Associativity of Operators <assign> <id> = <expr> A <expr> + <term> <expr> + <term> <factor> <term> <factor> <id> <factor> <id> A <id> C B <assign>  <id> = <expr> <id>  A | B | C <expr>  <expr> + <term> | <term> <term>  <term> * <factor> | <factor> <factor>  ( <expr> ) | <id> Addition operators associate from left to right

  33. BNF • A BNF rule that has its left hand side appearing at the beginning of its right hand side is left recursive . • Left recursion specifies left associativity • Right recursion is usually used for associating exponetiation operators <factor>  <exp> ** <factor> | <exp> <exp>  ( <expr> ) | <id>

  34. Ambiguous If Grammar <stmt>  <if_stmt> <if_stmt>  if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt> • Consider the sentential form: if <logic_expr> then if <logic_expr> then <stmt> else <stmt>

  35. Parse Trees for an If Statement <if_stmt> If <logic_expr> then <stmt> else <stmt> <if_stmt> if <logic_expr> then <stmt> <if_stmt> If <logic_expr> then <stmt> <if_stmt> if <logic_expr> then <stmt> else <stmt>

  36. Unambiguous Grammar for If Statements <stmt>  <matched> | <unmatched> <matched>  if <logic_expr> then <matched> else <matched> | any non-if statement <unmatched>  if <logic_expr> then <stmt> | if <logic_expr> then <matched> else <unmatched>

  37. Extended BNF (EBNF) • Optional part denoted by […] <selection>  if ( <expr> ) <stmt> [ else <stmt> ] • Braces used to indicate the enclosed part can be repeated indefinitely or left out <ident_list>  <identifier> { , <identifier> } • Multiple choice options are put in parentheses and separated by the or operator | <for_stmt>  for <var> := <expr> (to | downto) <expr> do <stmt>

  38. BNF vs EBNF for Expressions BNF: <expr>  <expr> + <term> | <expr> - <term> | <term> <term>  <term> * <factor> | <term> / <factor> | <factor> EBNF: <expr>  <term> { (+ | - ) <term> } <term>  <factor> { ( * | / ) <factor>

More Related