220 likes | 277 Views
5. Context-Free Grammars and Languages. CIS 5513 - Automata and Formal Languages – Pei Wang. Languages and grammars. Regular expression: constants and operators Grammar: variables and rewriting rules Difference: whether to give a pattern a name
E N D
5. Context-Free Grammars and Languages CIS 5513 - Automata and Formal Languages – Pei Wang
Languages and grammars Regular expression: constants and operators Grammar: variables and rewriting rules Difference: whether to give a pattern a name Example: Binary palindromes do not form a regular language, but can be specified as P → ɛ | 0 | 1 | 0P0 | 1P1 where ‘P’ is a variable, ‘→’ the production symbol, and ‘|’ for alternatives
Context-free grammar A Context-Free Grammar (CFG) G is defined as G = (V, T, P, S): • V: the set of variables (non-terminals, syntactic categories, each as a language) • T: the set of terminal symbols (alphabet) • P: the set of productions (rules) that each has a variable (head) and a string (body) • S: the start symbol (as the whole language)
Example of CFG A simple arithmetic expression consists of identifiers connected by ‘+’ and ‘*’ operators E → I | E + E | E * E | (E) I → a | b | Ia | Ib | I0 | I1 The rules are defined individually, without ‘|’ In E → E + E, the three E’s represent different strings The star operator can be achieved by recursion
Derivation using a CFG A CFG defines a language that consists of the strings of terminals derived from the start symbol using the production rules • Derivation: from the start symbol to the terminals • Recursive inference: from the terminals to the start symbol
Example of derivation Here ‘’ means “derive in one step”. With a ‘*’ above, it means “derive in any number of steps”; With a ‘G’ below, it means “derive by grammar G”
Leftmost/rightmost derivation Leftmost/rightmost derivation restrict the selection of variable to be derived
Context-free language L(G) is called a context-free language (CFL) since G is a context-free grammar A string derived from S is a “sentential form”, which can be “left” (or “right”) if formed by an leftmost (or rightmost) derivation
CFG and regular language A CFG specifies a regular language if it is in one of the following two forms: • Right-linear: if all of its rules have the form of P → ε, P → a, or P → aQ • Left-linear: if all of its rules have the form of P → ε, P → a, or P → Qa The former maps to an ε-NFA, while the latter to the reverse of the former
Exercises for Section 5.1.1 • 5.1.1(a): define the CFG of { 0n1n | n 1 } • 5.1.1(b): define the CFG of { aibjck | i ≠ j or j ≠ k } Solutions: http://infolab.stanford.edu/~ullman/ialcsols/sol5.html#sol51 Alternative solution of 5.1.1(b): S AD | EC A ɛ | aA B ɛ | bB C ɛ | cC D bB | cC | bDc E aA | bB | aEb
Exercises for Section 5.1.2 Solutions: http://infolab.stanford.edu/~ullman/ialcsols/sol5.html#sol51
Parse trees A derivation can be expressed as a parsing tree
Equivalent statements about CFG The sequence of leaves of a parse tree, from left to right, is the yield of the tree, which is the terminal string derived from the start symbol
Parsers Parsing or syntactic analysis is the process of analyzing a string of symbols according to the rules of a formal grammar A parser is a program that generates parse trees from input strings according to a given grammar In UNIX, the YACC command takes a CFG as input, and the output is a fragment of C code that can generate a parse tree
Ambiguity in CFG A CFG is “ambiguous” if there is a string as the yield of different parse trees For example, the grammar of arithmetic expressions allow E + E * E to be parsed in two ways, for the different orders of the two operators The mere existence of different derivations does not imply ambiguity
Removing ambiguity There is no algorithm that can decide whether an arbitrary CFG is ambiguous, nor to remove all ambiguity Some ambiguity can be removed by revising the CFG, such as separating the order of + and * in expressions:
Unique derivation In an unambiguous grammar, leftmost derivations are unique, and so are rightmost derivations Therefore though a variable can have more than one production rule, only one can be applied in each situation For a given CFG, a string has two distinct parse trees if and only if it has two distinct leftmost derivations from the start symbol
Inherent ambiguity A CFL is “inherently ambiguous” if all its grammars are ambiguous Example: L = {anbncmdm} {anbmcmdn} where m and n are positive integers It is easy to get a CFG that recognizes the two types of strings separately, but it will given the string “aabbccdd” two leftmost derivations, as well as two parse trees
Exercises for Section 5.4 Exercise 5.4.3: Find an unambiguous grammar for the above language Solutions: http://infolab.stanford.edu/~ullman/ialcsols/sol5.html#sol54
Applications of CFG Examples: • Mathematical language • Logical language • Markup language • Programming language • Natural language