Discrete Maths

Discrete Maths 242-213, Semester 2, 2013-2014 • Objectives • to introduce grammars and show their importance for defining programming languages; • to show the connection between REs and grammars 14. Grammars

Overview • Why Grammars? • Languages • Using a Grammar • Parse Trees • Ambiguous Grammars • Kinds of Grammars • More Information

1. Why Grammars? • Grammars are the standard way of defining programming languages. • Tools exist for semi-autiomatically translating grammars into compilers (e.g. JavaCC, lex, yacc, ANTLR) • this saves weeks of work

2. Languages • We use a natural language to communicate • its grammar rules are very complex • the rules don’t cover important things • We use a formal language to define a programming language • its grammar rules are fairly simple • the rules cover almost everything continued

A formal language is a set of legal strings. • The strings are legal if they correctly use the language’s alphabet and grammar rules. • The alphabet is often called the language’s terminal symbols (or terminals).

Example 1 not shown here; see later • Alphabet (terminals) = {1, 2, 3} • Using the grammar rules, the language is: L1 = { 11, 12, 13, 21, 22, 23, 31, 32, 33} • L1 is the set of strings of length 2.

Example 2 • Terminals = {1, 2, 3} • Using different grammar rules, the language is: L2 = { 111, 222, 333} • L2 is the set of strings of length 3, where all the terminals are the same.

Example 3 • Terminals = {1, 2, 3} • Using different grammar rules, the language is: L3 = {2, 12, 22, 32, 112, 122, 132, ...} • L3 is the set of strings whose numerical value is divisible by 2.

3. Using a Grammar • A grammar is a notation for defining a language, and is made from 4 parts: • the terminal symbols • the syntactic categories (nonterminal symbols) • e.g. statement, expression, noun, verb • the grammar rules (productions) • e,g, A => B1 B2 ... Bn • the starting nonterminal • the top-most syntactic category for this grammar continued

We define a grammar G as a 4-tuple: G = (T, N, P, S) • T = terminal symbols • N = nonterminal symbols • P = productions • S = starting nonterminal

3.1. Example 1 • Consider the grammar: T = {0, 1} N = {S, R} P = { S => 0 S => 0 R R => 1 S } S is the starting nonterminal the right hand sides of productions usually use a mix of terminals and nonterminals

Is “01010” in the language? • Start with a S rule: • Rule String Generated-- SS => 0 R 0 RR => 1 S 0 1 SS => 0 R 0 1 0 RR => 1 S 0 1 0 1 SS => 0 0 1 0 1 0 • No more rules can be applied since there are no more nonterminals left in the string. Yes, it is in the language.

Example 2 • Consider the grammar: T = {a, b, c, d, z} N = {S, R, U, V} P = { S => R U z | z R => a | b R U => d V U | c V => b | c } S is the starting nonterminal

The notation: X => Y | Z is shorthand for the two rules: X => YX => Z • Read ‘|’ as ‘or’.

Is “adbdbcz” in the language? • Rule String Generated-- SS => R U z R U zR => a a U zU => d V U a d V U zV => b a d b U zU => d V U a d b d V U zV => b a d b d b U zU => c a d b d b c z Yes! This grammar has choices about how to rewrite the string.

Is “abdbcz” in the language? No • Rule String Generated-- SS => R U z R U zR => a a U zwhich U rule? • U must be replaced by something beginning with a ‘b’, but the only U rule is: U => d V U | c

3.2. BNF • BNF is a shorthand notation for productions • Backus Normal Form, or • Backus-Naur Form • We have already used ‘|’: X => Y1 | Y2 | ... | Yn John Backus (1924 – 2007) Peter Naur (1928 – ) continued

X => Y [Z]is shorthand for two rules: X => YX => Y Z • [Z] means 0 or 1 occurrences of Z. continued

X => Y { Z }is shorthand for an infinite number of rules: X => YX => Y ZX => Y Z ZX => Y Z Z Z : • { Z } means 0 or more occurrences of Z.

3.3. A Grammar for Expressions • Consider the grammar: T = { 0, 1, 2,..., 9, +, -, *, /, (, ) } N = { Expr, Number } P = { Expr => Number Expr => ( Expr ) Expr => Expr + Expr | Expr - Expr | Expr * Expr | Expr / Expr } Expr is the starting nonterminal

Defining Number • The RE definition for a number is: number = digit digit*digit = [0-9] • The productions for Number are: Number => Digit { Digit }Digit => 0 | 1 | 2 | 3 | … | 9 orNumber => Number Digit | DigitDigit => 0 | 1 | 2 | 3 | ... | 9

Using Productions • Expand Expr into (125-2)*3 Expr => Expr * Expr => ( Expr ) * Expr => ( Expr - Expr ) * Expr => ( Number - Number ) * Number : => ( 125 - 2 ) * 3 continued

Expand Number into 125 Number => Number Digit => Number Digit Digit => Digit Digit Digit => 1 2 5

3.4. Grammars are not Unique • Two grammars that do the same thing: Balanced => eBalanced => ( Balanced ) Balanced and: Balanced => eBalanced => ( Balanced )Balanced => Balanced Balanced • Both generate the same strings: (()(())) () e (()())

4. Parse Trees • A parse tree is a graphical way of showing how productions are used to generate a string. • Data structures representing parse trees are used inside compilers to store information about the program being compiled.

Example 1 • Consider the grammar: T = { a, b } N = { S } P = { S => S S | a S b | a b | b a } S is the starting nonterminal

expand the symbol in the circle Parse Tree for “aabbba” S The root of the tree is the start symbol S: Expand using S => S S S S S Expand using S => a S b continued

S S S S a b Expand using S => a b S S S a S b a b Expand using S => b a continued

Stop when there are no more nonterminals in leaf positions. Read off the string by reading the leaves left to right. S S S a b a S b a b

Example 2 • Consider the grammar: T = { a, +, *, (, ) } N = { E, T, F } P = { E => T | T + E T => F | F * T F => a | ( E ) } E is the starting nonterminal

Is “a+a*a” in the Language? E Expand using E => T + E E T + E Expand using T => F E T + E F continued

Continue expansion until: E T + E F T a * T F a F a

5. Ambiguous Grammars • A grammar is ambiguous when a string can be represented by more than one parse tree • it means that the string has more than one “meaning” in the language • e.g. a variant of the last grammar example: P = { E => E + E | E * E | ( E ) | a }

Parse Trees for “a+a*a” E E E E + E * E and a E a E + E * E a a a a continued

The two parse trees allow a string like “5+5*5” to be read in two different ways: • 5+ 25 (the left hand tree) • 10*5 (the right hand tree)

Why is Ambiguity Bad? • In a programming language, a string with more than one meaning means that the compiler and run-time system will not know how to process it. • e.g in C: x = 5 + 5 * 5;// what is the value in x?

6. Kinds of Grammars • There are 4 main kinds of grammar, of increasing expressive power: • regular (type 3) grammars • context-free (type 2) grammars • context-sensitive (type 1) grammars • unrestricted (type 0) grammars • They vary in the kinds of productions they allow. Avram Noam Comsky (1928 – )

6.1. Regular Grammars S => wTT => xTT => a • Every production is of the form: A => a | a B | e • A, B are nonterminals, a is a terminal • These are sometimes called right linear rules because if a nonterminal appears in the rule body, then it must appear last. • Regular grammars are equivalent to REs (and also to automata).

An Equivalence Diagram Regular Grammars Automata same expressive power REs

6.2. Context-Free Grammars A => aA => aBcdB => ae • Every production is of the form: A => d • A is a nonterminal, d can be any number of nonterminals or terminals • Most of our examples have been context-free grammars • used widely to define programming languages • they subsume regular grammars

6.3. Context-Sensitive Grammars A => a11A => aB2dB2 => ae • Every production is of the form: a => d • a, d can contain any number of terminals and nonterminals • a must contain at least 1 nonterminal • size(d) >= size(a) • d cannot bee continued

Context-sensitive rules allow the grammar to specify a context for a rewrite • e.g. A1a0 => 1b00 • the string 2A1a01 becomes 21b001 • Context-sensitive grammars are more powerful than context-free grammars because of this context ability.

Example • The language: E = {012, 001122, 000111222, ... } or, in brief, E = {0n 1n 2n | n >= 1} can only be expressed using a context-sensitive grammar: S => 0 A 1 2 | 0 1 2 A => 0 A 1 C | 0 1 C C 1 => 1 C C 2 => 2 2

Rewrite S to 001122 • S => O A 1 2 0 A 1 2 => 0 0 1 C 1 2 0 0 1 C 12 => 0 0 1 1 C 2 0 0 1 1 C 2 => 0 0 1 1 2 2

6.4. Unrestricted Grammars A => e11A => aB2 => aeA • Every production is of the form: a => d • a, d can contain any number of terminals and nonterminals; a must contain at least 1 nonterminal • no restrictions on size(d) • it may be smaller than size(a) • d can bee • Also called phrase-structure grammars. more general than context sensitive

Example • The language: E = {e, 012, 001122, 000111222, ... } or, in brief, E = {0n 1n 2n | n >= 0} can only be expressed using an unrestricted grammar: S => 0 A 1 2 | e A => 0 A 1 C | e C 1 => 1 C C 2 => 2 2 new features

Rewrite S to 012 • S => 0 A 1 2 • 0 A 1 2 => 0 1 2 • using A ==> e

6.5. Why so many Grammar Kinds? • More powerful grammars are more expressive, but also harder to implement efficiently • a trade-off between power and implementation continued

For example, most compilers have two grammar-based components: • the lexical analyzer • uses REs (regular grammars) to parse basic nonterminals such as identifier and number • the syntax analyzer • uses (context-free) grammars to deal with complex syntactic categories such as loops and expressions

Discrete Maths

Discrete Maths

Presentation Transcript

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

EE1J2 – Discrete Maths Lecture 8

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

Discrete Maths

EE1J2 – Discrete Maths Lecture 3

Discrete Maths

Discrete Maths