410 likes | 572 Views
Properties of Context-Free Languages. Juan Carlos Guzmán CS 6413 Theory of Computation Southern Polytechnic State University. Summary. Normal Forms Pumping Lemma Closure Properties Decision Properties. Normal Forms. Recall that many different grammars generate the same language
E N D
Properties of Context-Free Languages Juan Carlos Guzmán CS 6413 Theory of Computation Southern Polytechnic State University
Summary • Normal Forms • Pumping Lemma • Closure Properties • Decision Properties
Normal Forms • Recall that many different grammars generate the same language • We would like to restrict the form of the productions of the CFG • Chomsky Normal Form • Greibach Normal Form • Tasks to accomplish • Eliminate useless symbols • Eliminate ε-productions • Eliminate unit productions
Grammar Transformations • We are about to present a series of transformations on grammars • You should consider each of them as a “transformation function” T: GrammarGrammar
Elimination of Useless Symbols • Let G=(V,T,P,S) • XV is useful if there exist , , and w such that S *X*w • Two considerations: • X must generate strings • X*v • X must be reachable from S • S*X
Elimination of Non-Generating Symbols (Tg) • Let G=(V,T,P,S ) be a CFG • G’ = (V’ {S },T,P’,S ), where • V’ = { A | (A)P (TV’ )* } • P’ = { (A) | (A)P AV’ (TV’ )* } • contains only generating symbols
Example • G = ({S,A,B,C },{a,b },P ,S ), where • P = { Sa | A, AAB | BCA | a, Bb, CACA | BCB } • V’ = {S,A,B } • G’ = ({S,A,B },{a,b },P’ ,S ), where • P’ = { Sa | A, AAB | a, Bb }
Elimination of Non-Reachable Symbols (Tr) • Let G=(V,T,P,S ) be a CFG • G’ = (V’,T,P’,S ), where • V’ = {S} { B | (AB )PAV’ } • P’ = { A | (A)PAV’ } • contains only reachable symbols
Example • G = ({S,A,B,C },{a,b },P ,S ), where • P = { Sa | A, AAB | a, Bb, CACA | BCB } • V’ = {S,A,B } • G’ = ({S,A,B },{a,b },P’ ,S ), where • P’ = { Sa | A, AAB | a, Bb }
Useful Symbols • Remove • non-generating symbols • non-reachable symbols
Elimination of ε-Productions (Tε) • Let G=(V,T,P,S ) be a CFG • Vε = { A | (A)P Vε*} • G’ = (V-Vε,T,P’,S ), where • P’ = {A0X1… Xkk | A0B1… BkkP for all 1i kBiVεXi {ε, Bi } for all 0i ki(T V-Vε)* |0X1… Xkk | > 0 } • does not contain ε-prods and generates L(G) - {ε}
Example • G = ({S },{a,b },P ,S ), where • P = { SaSbS | bSaS | ε } • Vε = {S } • G’ = ({S },{a,b },P’ ,S ), where • P’ = {SaSbS | aSb | abS | ab | bSaS | bSa | baS | ba } • Note that G’ does not generate ε
Elimination of Unit Productions (Tu ) • Let G=(V,T,P,S ) be a CFG • Let Up = { (A,A) | AV} { (A,C ) | (A,B)Up (BC )P } • G’ = (V,T,P’,S ), where • P’ = {A | (A,B)Up (B)P V } • does not contain unit prods and generates L(G )
Example • G = ({E,T,F },{+,*,(,),a },P ,E ), where • P = {EE+T |T, TT*F |F, Fa | (E )} • Up = {(E,E ),(E,T ),(E,F ),(T,T ),(T,F ),(F,F )} • G’ = (V,T,P’,S ), where • P’ = {EE+T |T*F | a | (E ), TT*F | a | (E ), Fa | (E )}
Summary of Transformations • Given a CFG G, we can obtain a new grammar G’ such that • no ε-productions • no unit productions • no useless symbols • by transforming the original grammar in this order: Tr Tg Tu Tε
Results of the Transformations • After the transformations • the grammars do not have useless symbols (and associated productions) • their productions (A) are not • ε-productions • Unit productions • Therefore, must satisfy • ||>1, or • T
Implications for Transformed Grammars • Transformed grammars have some nice properties • No unit productions • No ε-productions • However, they produce “bushy” trees
Chomsky Normal Form • Any CFG without ε can be transformed so that each of its productions is of the form • ABC, where A,B,C V • Aa, where A V a T • The idea behind CNF is to obtain grammars whose parse trees are binary trees
Chomsky Normal Form • Productions of grammars not yet in CNF, but already transformed, are of the following forms • AX1… Xkk >1, allXi T V, or • Aaa T • We need to further transform the first kind of productions so that • the right-hand-side consists only of variables, and • break long RHS’s into chains of productions
Chomsky Normal Form • Transformations • For every terminal a that appears on a RHS of length 2 or more • Create a production Aa • Replace a in all such productions with A • Replace every production AB1… Bk (k >2) with • AB1C1 • C1B2C2 • … • Ck-2Bk-1Bk
Greibach Normal Form • All productions must be of the form • AaB1… Bkk 0 • Note that each derivation step is associated with the generation of a terminal • This translates nicely to PDA’s where each movement of the automaton will be guided by the recognition of an input character • To convert to GNF • Order the variables (A1 … An) • Modify the production set so that • Ai Aj implies that i j • remove left recursion i.e., Ai Aj implies that i < j • Ai a • Ai a, V * • The algorithm resembles matrix triangularization • It appears in 1st edition of our book
Relation Between Height and Yield of a CNF Parse Tree • Note that tree nodes of grammars in CNF are • binary nodes for productions (ABC) • unit terminal nodes for productions (Aa) • The yield of a complete CNF parse tree of height n is of size 2n-1 or less S height n-1 At most 2n-1 height n a1a2 a3 … at
Pumping Lemma • Let L be a context-free language. Then there exists a constant n (which depends on L) such that for every string z in L such that |z|n, we can break z into five strings, z = uvwxy, such that: • |vwx| n • vx ε • For all i 0, the string uviwxiy is also in L
Pumping Lemma • In plain words • For any context-free language • Words of large size will contain a substring • Somewhere in the middle • Not null, not too big • That substring can itself be broken into three pieces vwx • v not null or x not null • v and x can be “pumped” (together) over and over again • The new words are guaranteed in the language • How large the words must be in order to be considered “large” depends on the actual language
Pumping Lemma – Proof • Find a CNF for the language • The size of the word relates to the height of the tree A0 A1 A2 Ak a
Pumping Lemma – Proof • Find a CNF for the language • For large words, a variable must be repeated S Ai Aj Note: Ai = Aj , i < j u v x y w
Related Strings • The strings • uwy • uvvwxxy • uvnwxny • are also in the language
How about ε? • If the language contains ε • The transformations remove ε from the grammar • Therefore you get a different language!!! • CNF is not defined for languages with ε • If a language contains ε • A new grammar can be given, which generates the same language • ε will be generated in one derivation • All other productions comply with CNF
Closure Properties • Context-free languages are closed under • Substitution • Regular Operators • Homomorphism • Reversal • Intersection with regular language • Inverse homomorphism
Substitution • A substitution is an operation which replaces characters with strings • These strings are pulled from a particular language
Substitution—Formally • Let Σ be an alphabet • Let La a language associated to aΣ • s(a) = La • s(a1a2…an) = s(a1)s(a2)…s(an) = La1 La2…Lan • s(L) = { s(w) | wL }
Substitution • CFL’s are closed under substitution with CFL’s • Let G = (V, Σ,P,S ), such that L(G ) = L • Let Ga = (Va,Ta,Pa,Sa), such that L(Ga) = La • Let G’ = (V’,T’,P’,S ) where • V’ =V (aΣVa ) • T’ = (aΣTa ) • P’ = (aΣPa ) P’’, where • P’’ is all productions of P, where each terminal a was replaced by the corresponding Sa • G’ generates s(L)
Example • G = ({S},{0,1},P,S), where • P= {S SS| 0S1 | ε} • L0 = {(} • L1 ={)} • Or • L0 =0* • L1=1*
Closure Under Regular Operators • CFL’s are closed under • Union • Concatenation • Closure (*), and positive closure(+)
Closure Under Homomorphism • CFL’s are closed under homomorphism • This is a special case of substitution • Substitution with a single string
Reversal • CFL’s are closed under reversal • Just reverse all productions
Intersection with a Regular Language • CFL’s are not closed under intersection • They are closed under intersection with a regular language
Inverse Homomorphism • CFL’s are closed under inverse homomorphism
Decision Properties of CFL’s • Complexity to transform grammars to PDA’s, and within PDA’s • Complexity of transformation to CNF • Testing Emptyness of CFL’s • Testing Membership in a CFL
Undecidable Problems • Is a given CFG G ambiguous? • Is a given CFL L inherently ambiguous? • Is the intersection of two CFL’s empty? • Are two CFL’s the same? • Is a given CFL equal to Σ*, where Σ* is the alphabet of the language?