330 likes | 365 Views
Grammars. Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth Rosen. Alphabets and Languages (Review). Definition: A vocabulary (or alphabet ) V is a finite, nonempty set of symbols .
E N D
Grammars Chuck Cusack • Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5th edition, by Kenneth Rosen
Alphabets and Languages (Review) • Definition: A vocabulary(oralphabet) V is a finite, nonempty set of symbols. • Definition: A word or sentence over V is a finite string of symbols from V. • Definition: The empty stringornull string, denoted by , is the string containing no symbols. • Definition: The set of all words over V is denoted by V*. • Definition: A language overV is a subset of V*.
Language Examples (Review) • Let V={0,1} • 00110, 11111, 00, and 11 are words over V • 012, a234, and 222 are not words over V • V*={0,1,00,01,10,11,000,…} • In other words,V*is the set of all binary strings • The set of strings consisting of only 0s is a language over V* • {1,10,100,1000,10000,…} is a language over V*
Concatenation (Review) • Definition: Let V be a vocabulary, and A and B be subsets of V*. The concatenation of A and B, denoted by AB, is the set of all strings of the form xy, where xÎA and yÎB. • Example: Let A={0, 10}, and B={1,12}. Then • AB={01, 012, 101, 1012} • BA={10, 110, 120, 1210} • AA={00, 010, 100, 1010} • AAA=A(AA)={000, 0010, 0100, 01010, 1000, 10010, 10100, 101010}
Concatenation: An (Review) • Definition: Let V be a vocabulary, and Aa subset of V*. Then A0={}, and for n>0, we can define An=A(n-1)A • Example: Let A={0, 10}. Then • A0={l} • A1=A0A={l}A=A={0,10} • A2=A1A ={00, 010, 100, 1010} • A3= A2A={000, 0010, 0100, 01010, 1000, 10010, 10100, 101010}
Kleene Closure (Review) • Definition: Let V be a vocabulary, and Aa subset of V*. The Kleene closure of A, denoted by A*, is the set consisting of concatenations of an arbitrary number of strings from A. That is, • Definition:A+ is the set of nonempty strings over A. In other words,
Kleene Closure Example (Review) • Example: Let A={0, 1}. Then • A0={} • A1={0,1} • A2={00, 01, 10, 11} • A3={000, 001, 010, 011, 100, 101, 110, 111} • A*={0,1}*={All binary strings} • Example: Let B={111}. Then • B0={}, B1={111}, B2={111111} • B3={111111111} • B* is the set of strings with 3n 1s, for every n³0.
Grammars and Languages • Many languages can be defined by grammars. • We are particularly interested in phrase-structure grammars. • Before we can define phrase-structure grammars, we need to define a few more terms.
Special Symbols • Definition: A nonterminal symbol(or just nonterminal) is a symbol which can be replaced by other symbols. • Definition: A terminal symbol(or just terminal) is a symbol which cannot be replaced by other symbols. • Definition: The start symbolis a special symbol, usually denoted by S. • The set of terminals is denoted by T, and the set of nonterminals by N. • S is a nonterminal.
Productions • Definition: A production(or substitution rule)is a rule which tells how to replace one string from V* with another string. • Productions are denoted by ab, which denotes that a can be replaced by b. • Example • Let SA0, AA1, and A0 be productions • Then I can replace S with A0 • Since I can replace A with A1, A0 can become A10 • Since I can replace A with 0, A10 can become 010 • Thus, I can replace S with 010
Phrase-Structure Grammars • Definition: A phrase-structure grammar is a 4-tuple G=(V,T,S,P), where • V is a vocabulary • TV is a set of terminals • SV is a start symbol • P is a set of productions • N=V-T is the set of nonterminals • Each production contains at least one nonterminal on its left side. • We will always use S as the start symbol.
Direct Derivations • Let G=(V,T,S,P) be a phrase-structure grammar. • Let A=lar and B=lbr, where l, a, b, r Î V*. • Let abbe a production. • Then we can derive B from A. • Thus we say that A is directly derivable fromB. • We write this as AB
Derivations • Let G=(V,T,S,P) be a phrase-structure grammar • Let A1, A2,…,An V* be such that A1A2…An • Then we say that An is derivable fromA1. • We write A1* An • The sequence of productions used is called a derivation.
Generating Languages • Let G=(V,T,S,P) be a grammar • Definition: The language generated by G, denoted L(G) , is the set of all strings of terminals that are derivable from S. • Put another way, L(G)={w T* | S * w }
Example 1 Let G be the grammar with • V={S,0,1} • T={0,1} • P={SS0, S0} • Clearly S0, so 0L(G) • Also, SS000, so 00L(G) • And, SS0S00000, so 000L(G) • It is not hard to see that L(G) is the language consisting of all strings with 1 or more 0s.
Example 2 Let G be the grammar with V={S,0,1}, T={0,1}, and P={SSS, S1, S0} • Clearly S0, so 0L(G) • Also, S1, so 1L(G) • Since SSSS101, so 01L(G) • In general, we can get a sequence of Ss, and replace each with either 0 or 1. • Given this fact, it is easy to see that L(G) ={0,1}+, the set of all non-empty binary strings
Example 3 Let G be the grammar with V={S,A,B,0,1}, T={0,1}, and P={SAB, BBB, AAA, A0, B1} • Clearly SAB0B01, so 01L(G) • Also, SABAAB0AB00B001, so 001L(G) • Similarly, we can get 011, 0011, 0001, etc. • In general, we can get a sequence of n0s followed by m1s, where n>0, m>0. • Thus L(G) ={0n1m | m and n are positive integers}
Type 0 Grammars • Type 0 grammars have no restrictions on the types of productions that are allowed. • Thus type 0 grammars are just phrase-structure grammars. • This is not too exciting, so we will move on to type 1 grammars.
Type 1 Grammars • In a type 1 grammar, productions are of the form • aXbacb,where XN and a,b,cV* with c¹ • (or S, but ignore this for now) • Thus, a production can only be applied if the symbol X is surrounded by a and b. • In other words, the production can only be applied in a certain context. • This is why type 1 grammars are also called context-sensitive grammars.
Type 2 Grammars • Productions are of the form • Xa, where XN and aV*. • Thus, if X is in a string, we can replace X with a no matter what surrounds X. • In other words, the context in which X appears does not matter. • This is why type 2 grammars are called context-free grammars. • Context-free grammars produce context-free languages.
Type 3 Grammars • Productions are of the form • Xa, where XN and aT • XaY, where X,YN and aT • S • Type 3 grammars are called regular grammars. • Regular grammars produce regular languages. • It is easy to see that a type 3 grammar is a type 2 grammar.
Type 0: phrase-structure Type 1: context-sensitive Type 2: context-free Type 3: regular Types of Grammars • The following summarizes the relationships between the types of grammars
Regular Grammar Example • Let G be the grammar with • V={S,A,0,1}, • T={0,1}, and • P={S0A, A0A, A1A, A1} • We can determine what the language is by constructing a few words. • S0A01 • S0A00A001 S0A01A011 • S0A00A000A0001 S0A00A001A0011 • S0A01A010A0101 S0A01A011A0111 • We can see that in general, L(G) is the set of binary strings beginning with 0 and ending with 1.
Limitations • Problem: Find a regular grammar that recognizes the following language • L={0n1n | n=0,1,2,…} • Solution: It cannot be done. • Proof: we will see this later. • Can you describe L with a regular expression? • Can you give a finite-state automaton that generates L? • Can you give any grammar that generates L?
Regular Languages and Sets • Theorem: Let A be a subset of V*. Then A is a regular language if and only if A is a regular set. • In other words, a language defined by a regular grammar can also be defined by a regular expression, and vice-versa. • Example: We just saw that the grammar with V={S,A,0,1}, T={0,1}, and P={S0A, A0A, A1A, A1} generates the set of binary strings beginning with 0 and ending with 1. • Recall that the regular set defined by 0(0È1)*1 is also the set of all binary strings beginning with 0 and ending with 1.
Grammars, Expressions, and Automata • Consider the set A={binary strings which start with 0 and end with 1} • We have seen previously that A is recognized by a finite-state automata. • We just saw that A was generated by the grammar with V={S,A,0,1}, T={0,1}, and P={S0A, A0A, A1A, A1} • We also saw that A is defined by the regular expression 0(0È1)*1 • This is no coincidence, as we will see next.
Grammars, Expressions, and Automata • Theorem: Let L be a language. The following three statements are equivalent • L is regular set (that is, L generated by a regular expression) • Lis a regular language (that is, L generated by a regular grammar) • L is recognized by a finite-state automaton • Put another way, L is a regular set if and only if L is a regular language if and only if L is recognized by a finite-state automaton. • In other words, regular sets, regular languages, and languages recognized by finite-state automata are all the same thing.
Grammar Applications • Context-free grammars are used to define the syntax of most programming languages. • Regular grammars are used in several applications, including the following • Searching text for patterns • Lexical analysis (during program compilation) • Efficient algorithms exist to determine if a string is in a context-free or regular language. • This is important for tasks like determining whether or not a program is syntactically valid.
Backus-Naur Form • Backus-Naur form (BNF) is a more compact representation of productions in a type 2 grammar. • All productions with the same left hand side are combined into one production • The symbol is replaced with ::= • All terminals are enclosed in <and> • The right hand sides of the various productions are combined, and separated by |
Backus-Naur Form Example • Consider the set of productions • SAB • BBB • AAA • A0 • B1 • In BNF, they are represented by • <S> ::= <A><B> • <B> ::= <B><B> | 1 • <A> ::= <A><A> | 0
Backus-Naur Form Example 2 • The Backus Naur form for the production of a signed integer is • <signed integer> ::= <sign><integer> • <sign> ::= + | - • <integer> ::= <digit> | <digit><integer> • <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Backus-Naur Form Applications • Specifying the syntax for programming languages including • Java • LISP • Specifying database languages • SQL • Specifying markup languages • XML