290 likes | 472 Views
Simplification of Grammars. Lecture 17 Naveen Z Quazilbash. Overview. Attendance Motivation Simplification of Grammars Eliminating useless variables Eliminating null productions Eliminating unit productions Quiz result. Motivation for grammar simplification. Parsing Problem
E N D
Simplification of Grammars Lecture 17 Naveen Z Quazilbash
Overview • Attendance • Motivation • Simplification of Grammars • Eliminating useless variables • Eliminating null productions • Eliminating unit productions • Quiz result
Motivation for grammar simplification • Parsing Problem • Given a CFG G and string w, determine if wϵ L(G). Fundamental problem in compiler design and natural language processing • If G is in general form then the procedure maybe very inefficient. So the grammar is “transformed” into a simpler form to make the parsing problem easier.
Simplification of Grammars • It involves the removal of: • Useless variables • ε-productions • Unit productions
Useless variables: There are two types of useless variables: • Variables that cannot be reached • Variables that do not derive any strings
ε-productions E.g.: Aε • Note that if we remove these productions, the language no longer includes the empty string.
Unit productions: They are of the form AB Or AA
1) Unreachable Variables • E.g.: SBS|B|E ADA|D|S BCB|C CaC|a DbD|b EcE|c
To find unreachable variables, draw a dependency graph • Dependency Graph: • Vertices of the graph are variables • The graph doesn’t include alphabet symbols, such as “a” or “b” • If there is a production A…..B…, i.e., the left side is A and the right side includes B, then there is an edge AB
A variable is reachable if there is a path from S to this variable • S itself is always reachable • After identifying unreachable variables, remove all productions with unreachable left side.
SBS|B|E ADA|D|S BCB|C CaC|a DbD|b EcE|c • Drawing its dependency graph: • Reachable: S, B, C, E B C S E A D
Grammar without unreachable variables: SBS|B|E BCB|C CaC|a EcE|c • Ex: Determine its language!!
2) Variables that don’t terminate • A variable A terminates if either: • There is a production A…. with no variables on the right, e.g. Aaabc, OR • There is a production A… where all variables on the right terminate; e.g. AaBbaC, where B and C terminate. • Note: to find all variables that terminate, keep looking for such productions until you cannot find any new ones.
TASK Example: SA|BC|DE AaA|bA BbB|b CEF DdD|BD|BA EaE|a FcFc|c • Remove all productions that include a variable that doesn’t terminate. • Note: We remove a production if it has such a variable on either side.
Solution x SA|BC|DE X AaA|bA x BbB|b x CEF X DdD|BD|BA x EaE|a x FcFc|c
SBC • BbB|b • CEF • EaE|a • FcFc|c • Ex: Determine its language.
3) Eliminating ε-Productions • Nullable variables: A variable is nullable if either: • There is a production Aε, or • There is a production AB1B2…Bn(only variables, no symbols), where all variables on the right side are nullable. • Note: to find all nullable variables, keep looking for such productions, until you cannot find any new ones.
TASK SSAB|SBC|BC AaA|a BbB|bC|C CcC|ε • First we find variables that can lead to the empty string: C=>ε B=>C=>ε S=>BC=>B=>C=>ε
x SSAB|SBC|BC AaA|a xBbB|bC|C xCcC|ε • Thus, S, B, and C can lead to ε; they are called nullablevariables
For each production that has nullable variables, consider all possible ways to skip some of these variables and add the corresponding productions. • E.g. WaWXaYZb, suppose that X, Y and Z are nullable; then there are 8 ways to skip some of them. • WaWab|aWXab|aWaYb|aWaZb|aWXaYb|aWXaZb| aWaYZb|aWXaYZb
Back to our grammar where S,B and C are nullable: SA|AB|SA|SAB|S|B|C|SB|BC|SBC AaA|a Bb|bB|bC|C Cc|cC|ε • Now, we can remove the ε- productions without changing the language. • The only possible change is losing the empty string, if it is in the original language.
So our grammar without null productions becomes: SA|AB|SA|SAB|S|B|C|SB|BC|SBC AaA|a Bb|bB|bC|C Cc|cC
4) Eliminating Unit Productions SAa|B Aa|bc|B BA|bb|C|cC Ca|C • First, for every variable, we find all single variables that can be reached from it: • For S: S=>B=>A, S=>B=>C • For A: A=>B=>C • For B: B=>A, B=>C • For C: NONE (C itself doesn’t count)
Use Dependency Graph! • Drawing Dependency Graph: • Vertices of the graph are variables. • If there is a unit production AB, then there is an edge AB. • A single variable is reachable from A if there is a pth from A to B.
Dependency Graph: B S C A
To construct an equivalent grammar without unit productions: • Remove all unit productions • For each pair A=>*B, where B is a single variable reachable from A, consider all productions Bp1|p2|…|pn; and add the corresponding productions A p1|p2|…|pn. • for example, since A=>*B and Bbb|cC, add the productions Abb|cC
SAa|B Aa|bc|B BA|bb|C|cC Ca|C SAa Bbb|cC Aa|bc Ca • Note that the variable B has become useless and we need to remove it! • Sbb|cC|a|bc|a • Ba|bc|a • Abb|cC|a • Ca Old non-unit productions new productions
Summary • Main steps of simplifying a grammar: • Remove useless variables, which cannot be reached or do not terminate. • Remove ε- productions. • Remove unit productions. • Remove useless variables again!