1 / 161

Discrete Maths

Discrete Maths. 241-303 , Semester 1 2014-2015. Objectives to introduce grammars and show their importance for defining programming languages and writing compilers; to show the connection between REs and grammars. 8 . Grammars. Overview. 1. Why Grammars? 2. Languages

dean
Download Presentation

Discrete Maths

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discrete Maths 241-303, Semester 12014-2015 • Objectives • to introduce grammars and show their importance for defining programming languages and writing compilers; • to show the connection between REs and grammars 8. Grammars

  2. Overview 1. Why Grammars? 2. Languages 3. Using a Grammar 4. Parse Trees 5. Ambiguous Grammars 6. Top-down and Bottom-up Parsing continued

  3. 7. Building Recursive Descent Parsers 8. Making the Translation Easy 9. Building a Parse Tree 10. Kinds of Grammars 11. From RE to a Grammar 12. Context-free Grammars vs. REs

  4. 1. Why Grammars? • Grammars are the standard way of defining programming languages. • Tools exist for translating grammars into compilers (e.g. JavaCC, lex, yacc, ANTLR) • this saves weeks of work

  5. 2. Languages • We use a natural language to communicate • its grammar rules are very complex • the rules don’t cover important things • We use a formal language to define a programming language • its grammar rules are fairly simple • the rules cover almost everything continued

  6. A formal language is a set of legal strings. • The strings are legal if they correctly use the language’s alphabet and grammar rules. • The alphabet is often called the language’s terminal symbols (or terminals).

  7. Example 1 not shown here; see later • Alphabet (terminals) = {1, 2, 3} • Using the grammar rules, the language is: L1 = { 11, 12, 13, 21, 22, 23, 31, 32, 33} • L1 is the set of strings of length 2.

  8. Example 2 • Terminals = {1, 2, 3} • Using different grammar rules, the language is: L2 = { 111, 222, 333} • L2 is the set of strings of length 3, where all the terminals are the same.

  9. Example 3 • Terminals = {1, 2, 3} • Using different grammar rules, the language is: L3 = {2, 12, 22, 32, 112, 122, 132, ...} • L3 is the set of strings whose numerical value is divisible by 2.

  10. 3. Using a Grammar • A grammar is a notation for defining a language, and is made from 4 parts: • the terminal symbols • the syntactic categories (nonterminal symbols) • e.g. statement, expression, noun, verb • the grammar rules (productions) • e,g, A => B1 B2 ... Bn • the starting nonterminal • the top-most syntactic category for this grammar continued

  11. We define a grammar G as a 4-tuple: G = (T, N, P, S) • T = terminal symbols • N = nonterminal symbols • P = productions • S = starting nonterminal

  12. 3.1. Example 1 • Consider the grammar: T = {0, 1} N = {S, R} P = { S => 0 S => 0 R R => 1 S } S is the starting nonterminal the right hand sides of productions usually use a mix of terminals and nonterminals

  13. Is “01010” in the language? • Start with a S rule: • Rule String Generated-- SS => 0 R 0 RR => 1 S 0 1 SS => 0 R 0 1 0 RR => 1 S 0 1 0 1 SS => 0 0 1 0 1 0 • No more rules can be applied since there are no more nonterminals left in the string. Yes, it is in the language.

  14. Example 2 • Consider the grammar: T = {a, b, c, d, z} N = {S, R, U, V} P = { S => R U z | z R => a | b R U => d V U | c V => b | c } S is the starting nonterminal

  15. The notation: X => Y | Z is shorthand for the two rules: X => YX => Z • Read ‘|’ as ‘or’.

  16. Is “adbdbcz” in the language? • Rule String Generated-- SS => R U z R U zR => a a U zU => d V U a d V U zV => b a d b U zU => d V U a d b d V U zV => b a d b d b U zU => c a d b d b c z Yes! This grammar has choices about how to rewrite the string.

  17. Is “abdbcz” in the language? No • Rule String Generated-- SS => R U z R U zR => a a U zwhich U rule? • U must be replaced by something beginning with a ‘b’, but the only U rule is: U => d V U | c

  18. 3.2. BNF • BNF is a shorthand notation for productions • Backus Normal Form, or • Backus-Naur Form • We have already used ‘|’: X => Y1 | Y2 | ... | Yn continued

  19. X => Y [Z]is shorthand for two rules: X => YX => Y Z • [Z] means 0 or 1 occurrences of Z. continued

  20. X => Y { Z }is shorthand for an infinite number of rules: X => YX => Y ZX => Y Z ZX => Y Z Z Z : • { Z } means 0 or more occurrences of Z.

  21. 3.3. A Grammar for Expressions • Consider the grammar: T = { 0, 1, 2,..., 9, +, -, *, /, (, ) } N = { Expr, Number } P = { Expr => Number Expr => ( Expr ) Expr => Expr + Expr | Expr - Expr | Expr * Expr | Expr / Expr } Expr is the starting nonterminal

  22. Defining Number • The RE definition for a number is: number = digit digit*digit = [0-9] • The productions for Number are: Number => Digit { Digit }Digit => 0 | 1 | 2 | 3 | … | 9 orNumber => Number Digit | DigitDigit => 0 | 1 | 2 | 3 | ... | 9

  23. Using Productions • Expand Expr into (125-2)*3 Expr => Expr * Expr => ( Expr ) * Expr => ( Expr - Expr ) * Expr => ( Number - Number ) * Number : => ( 125 - 2 ) * 3 continued

  24. Expand Number into 125 Number => Number Digit => Number Digit Digit => Digit Digit Digit => 1 2 5

  25. 3.4. Grammars are not Unique • Two grammars that do the same thing: Balanced => eBalanced => ( Balanced ) Balanced and: Balanced => eBalanced => ( Balanced )Balanced => Balanced Balanced • Both generate the same strings: (()(())) () e (()())

  26. 3.5. Productions for parts of C • Control structures: Statement => while ( Cond ) StatementStatement => if ( Cond ) StatementStatement => if ( Cond ) Statement else Statement • Testing (conditionals): Cond => Expr < Expr | Expr > Expr | ... continued

  27. Statement blocks: Statement => ‘{‘ StatList ‘}’ StatList => Statement ; StatList | Statement ;

  28. Using the Statement Production Statement => while ( Cond ) Statement => while ( Expr < Expr ) Statement => while ( Expr < Expr ) { StatList } => while ( Expr < Expr ) { Statement ; Statement ; } : => while (x < 10) { y++; x++; } • This example requires an extra Expr production for variables: Expr => VariableName

  29. 3.6. Generating a Language • For a given grammar, what strings can it generate? • the language is the set of legal strings • Most languages contain an infinite number of strings (e.g. English) • but there is a process for generating them continued

  30. For each production, list the strings that can be derived immediately. • On the 2nd round, put those strings back into the productions to generate more strings. • On the 3rd round, put those strings back... • Continue for as many rounds as you want.

  31. Example • Consider the grammar: T = { w, c, s, ‘{‘, ‘}’, ‘;’ } N = { S, L } P = { S => w c S | ‘{‘ L ‘}’ | s ‘;’ L => L S | e } S is the starting nonterminal

  32. Strings in First 3 Rounds S L Round 1: s; e Round 2: wcs; {} s; Round 3: wcwcs;wc{}{s;} wcs;{}s;s;s;wcs;s;{}

  33. 4. Parse Trees • A parse tree is a graphical way of showing how productions are used to generate a string. • Data structures representing parse trees are used inside compilers to store information about the program being compiled.

  34. Example 1 • Consider the grammar: T = { a, b } N = { S } P = { S => S S | a S b | a b | b a } S is the starting nonterminal

  35. Parse Tree for “aabbba” expand the symbol in the circle S The root of the tree is the start symbol S: Expand using S => S S S S S Expand using S => a S b continued

  36. S S S S a b Expand using S => a b S S S a S b a b Expand using S => b a continued

  37. Stop when there are no more nonterminals in leaf positions. Read off the string by reading the leaves left to right. S S S a b a S b a b

  38. Example 2 • Consider the grammar: T = { a, +, *, (, ) } N = { E, T, F } P = { E => T | T + E T => F | F * T F => a | ( E ) } E is the starting nonterminal

  39. Is “a+a*a” in the Language? E Expand using E => T + E E T + E Expand using T => F E T + E F continued

  40. Continue expansion until: E T + E F T a * T F a F a

  41. 5. Ambiguous Grammars • A grammar is ambiguous when a string can be represented by more than one parse tree • it means that the string has more than one “meaning” in the language • e.g. a variant of the last grammar example: P = { E => E + E | E * E | ( E ) | a }

  42. Parse Trees for “a+a*a” E E E E + E * E and a E a E + E * E a a a a continued

  43. The two parse trees allow a string like “5+5*5” to be read in two different ways: • 5+ 25 (the left hand tree) • 10*5 (the right hand tree)

  44. Why is Ambiguity Bad? • In a programming language, a string with more than one meaning means that the compiler and run-time system will not know how to process it. • e.g in C: x = 5 + 5 * 5;// what is the value in x?

  45. 6. Top-down and Bottom-up Parsing • Top-down parsing creates a parse tree starting from the start symbol and moves down towards the leaves. • used in most compilers • usually implemented as recursive-descent parsing continued

  46. Bottom-up parsing creates a parse tree starting from the leaves, and moves up towards the start symbol. • productions are used in ‘reverse’ • Both kinds of parsing often require “guessing” to decide which productions to use to parse a string.

  47. Example • Consider the grammar: T = { a, +, *, (, ) } N = { E, T, F } P = { E => T | T + E T => F | F * T F => a | ( E ) } E is the starting nonterminal

  48. Top-down Parse of “a+a*a” E T + E F T Top-down a * T F a F a

  49. Bottom-up Parse of “a+a*a” E T + E F T Bottom-up a * T F a F a

  50. Guessing when Building • Guessing occurs when there are several rules which can apply to the current nonterminal. • Compilers are very bad at guessing, and so program language designers try to make grammars as simple as possible.

More Related