960 likes | 988 Views
Context Free Grammars. Xiaoyin Wang CS 5363 Spring 2019. Context Free Grammars. Xiaoyin Wang CS 5363 Spring 2016. Today’s Class. Derivation Trees Ambiguity Normal Forms CYK Algorithm. Derivation Trees. Illustrate the derivation of a certain sentence from a grammar
E N D
Context Free Grammars Xiaoyin Wang CS 5363 Spring 2019
Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016
Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm
Derivation Trees Illustrate the derivation of a certain sentence from a grammar Different derivation orders may generate same derivation tree
Derivation Order Consider the following example grammar with 5 productions:
Leftmost derivation order of string : At each step, we substitute the leftmost variable
Rightmost derivation order of string : At each step, we substitute the rightmost variable
Leftmost derivation of : Rightmost derivation of :
Derivation Trees Consider the same example grammar: And a derivation of :
Derivation Tree (parse tree) yield
Sometimes, derivation order doesn’t matter Leftmost derivation: Rightmost derivation: Give same derivation tree
Ambiguity A grammar can have multiple derivation tree to derive a certain sentence Classification: inherent ambiguity and non-inherent ambiguity
Grammar for mathematical expressions Example strings: Denotes any number
Another leftmost derivation for
Good Tree Bad Tree Compute expression result using the tree
Two different derivation trees may cause problems in applications which use the derivation trees: • Evaluating expressions • In general, in compilers for programming languages
Ambiguous Grammar: A context-free grammar is ambiguous if there is a string which has: two different derivation trees or two leftmost derivations (Two different derivation trees give two different leftmost derivations and vice-versa)
Example: this grammar is ambiguous since string has two derivation trees
this grammar is ambiguous also because string has two leftmost derivations
Another ambiguous grammar: IF_STMT if EXPR then STMT if EXPR then STMT else STMT Variables Terminals Very common piece of grammar in programming languages
If expr1 then if expr2 then stmt1 else stmt2 IF_STMT if expr1 then STMT if expr2 then stmt1 else stmt2 Two derivation trees IF_STMT if expr1 then STMT else stmt2 if expr2 then stmt1
In general, ambiguity is bad and we want to remove it Sometimes it is possible to find a non-ambiguous grammar for a language But, in general we cannot do so
A successful example: Equivalent Ambiguous Grammar Non-Ambiguous Grammar generates the same language
Unique derivation tree for
An un-successful example: is inherently ambiguous: every grammar that generates this language is ambiguous
Ambiguity: Summary A grammar can have multiple parser tree to derive a certain sentence Inherent ambiguous language All grammars are ambiguous Non-inherent ambiguous language There exist at least one grammar that is not ambiguous Checking ambiguity of a grammar or a language: un-decidable problem
Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm
Normal Forms Chomsky Normal Forms BNF Normal Forms
A context free grammar is said to be in Chomsky Normal Form if all productions are in the following form: A → BC A → α • A, B and C are non terminal symbols • α is a terminal symbol
There are three preliminary simplifications Eliminate Useless Symbols Eliminate ε productions Eliminate unit productions 1 2 3
Eliminate Useless Symbols We need to determine if the symbol is useful by identifying if a symbol is generating and is reachable * • X is generating if X ω for some terminal string ω. • X is reachable if there is a derivation X αXβ for some α and β *
Example: Removing non-generating symbols S → AB | a A → b Initial CFL grammar S→AB | a A→b Identify generating symbols S→a A→b Remove non-generating
Example: Removing non-reachable symbols S→a Eliminate non-reachable S→a A → b Identify reachable symbols
The order is important? Looking first for non-reachable symbols and then for non-generating symbols can still leave some useless symbols. S → AB | a A → b S → a A → b
Finding generating symbols If there is a production A →α, and every symbol of α is already known to be generating. Then A is generating We cannot use S → AB because B has not been established to be generating S → AB | a A → b
Finding reachable symbols S is surely reachable. All symbols in the body of a production with S in the head are reachable. S → AB | a A → b In this example the symbols {S, A, B, a, b} are reachable.
There are three preliminary simplifications Eliminate Useless Symbols Eliminate ε productions Eliminate unit productions 1 2 3
Eliminate ε Productions • In a grammar ε productions are convenient but not essential • If L has a CFG, then L – {ε} has a CFG A ε * Nullable variable
If A is a nullable variable • Whenever A appears on the body of a production A might or might not derive ε S → ASA | aB A → B | S B → b | ε Nullable: {A, B}
Eliminate ε Productions • Create two version of the production, one with the nullable variable and one without it • Eliminate productions with ε bodies S → ASA| aB A → B | S B → b | ε S →ASA| aB | AS | SA | S | a A → B | S | ε B → b | ε
Eliminate ε Productions • Create two version of the production, one with the nullable variable and one without it • Eliminate productions with ε bodies S → ASA |aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S | ε B → b | ε
Eliminate ε Productions • Create two version of the production, one with the nullable variable and one without it • Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S | ε B → b | ε
There are three preliminary simplifications Eliminate Useless Symbols Eliminate ε productions Eliminate unit productions 1 2 3