570 likes | 719 Views
CS 208: Computing Theory. Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics. Context Free Languages. Context-Free Languages. So far …. Methods for describing regular languages Finite Automata Deterministic Non-deterministic Regular Expressions
E N D
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics
Context Free Languages Context-Free Languages
So far … • Methods for describing regular languages • Finite Automata • Deterministic • Non-deterministic • Regular Expressions • They are all equivalent, and limited • Cannot some simple languages like {0n1n | n is positive} • Now, we introduce a more powerful method for describing languages • Context-free Grammars (CFG)
Are CFGs any useful? • Extremely useful! • Artificial Intelligence • Natural language Processing • Programming Languages • specification • compilation
Example • This is a CFG which we call G1 • A0A1 • AB • B#
Example: production rules • This is a CFG which we call G1 • A0A1 • AB • B# Each line is a substitution rules or production rules
Example: variables • This is a CFG which we call G1 • A0A1 • AB • B# A and B are called variables or non-terminals
Example: variables • This is a CFG which we call G1 • A0A1 • AB • B# 0,1, and # are called terminals
Example: variables • This is a CFG which we call G1 • A0A1 • AB • B# A is the start variable
Rules • We use a CFG to describe a language by generating each string of that language • Write down the start variable • Pick a variable written down and a production rule that starts with that variable • Replace that variable with right-hand side of the production rule • Repeat until no variable remain
Derivations • This is a CFG which we call G1 • A0A1 • AB • B# • Derivations with G1 • A0A10B10#1 • A0A100A1100B1100#11 • A0A100A11000A111000B111000#111
Parse tree • Parse tree for 0#1 in G1 • A0A10B10#1 A A B 1 0 #
Parse tree Parse tree for 00#11 in G1 A0A100A1100B1100#11 A A A B 1 1 0 0 #
Context-free languages • All strings generated by a CFG constitute the language of the grammar • Example: L(G1)={0n#1n | n is positive} • Any language generated by a context-free grammar is a context-free language
A useful abbreviation • Production rules • A 0A1 • A B • B # • Can be written as • A 0A1 | B • B #
Another example • CFG G2 describing a fragment of English <SENTENCE> <NOUN-PHRASE><VERB-PHRASE> <NOUN-PHRASE> <CMPLX-NOUN>|<PREP-PHRASE> <VERB-PHRASE><CMPLX-VERB>|<CMPX-VERB><PREP-PHRASE> <PREP-PHRASE><PREP><CMPLX-NOUN> <CMPLX-NOUN><ARTICLE><NOUN> <CMPLX-VERB><VERB>|<VERB><NOUN-PHRASE> <ARTICLE> a | the <NOUN> boy | girl | flower <VERB> touches | likes | sees <PREP> with
Another example • Examples of strings belonging to L(G2) a boy sees the boy sees a flower a girl with a flower likes the boy with a flower
Another example • Derivation of a boy sees <SENTENCE> <NOUN-PHRASE><VERB-PHRASE> <CMPLX-NOUN><VERB-PHRASE> <ARTICLE><NOUN> <VERB-PHRASE> a <NOUN><VERB-PHRASE> a boy <VERB-PHRASE> a boy <CMPLX-VERB> a boy <VERB> a boy sees
Formal definitions • A context-free grammar is a 4-tuple <V, ∑, R, S> where • V is a finite set of variables • ∑is a finite set of terminals • R is a finite set of rules: each rule is a variable and a finite string of variable and terminals • S is the start symbol
Formal definitions • If • u and v are strings of variable and terminals, and • A w is a rule of the grammar, • Then uAv yields uwv, written uAv uwv • We write u * v if • u = v or • u u1 …. uk v
Formal definitions • The language of grammar G is • L(G) = {w | S * w}
Example • Consider G4 =<{S},{(,)},R,S> where R is • S (S) | SS | ε • What is the language of G4? • Examples: (), (()((())), …
Example • Consider G4 =<{S},{(,)},R,S> where R is • S (S) | SS | ε • What is the language of G4? • L(G4) is the set of strings of properly nested parenthesis
Example • Consider G4 =<{E,T,F},{a,+, x, (, )},R,E> where R is • E E + T | T • T T X F | F • F (E) | a • What is the language of G4? • Examples: a+a+a, (a+a) x a
Example • Consider G4 =<{E,T,F},{a,+, x, (, )},R,E> where R is • E E + T | T • T T x F | F • F (E) | a • What is the language of G4? • E stands for expression, T for Term, and F for Factor: so this grammar describes some arithmetic expressions
Ambiguity • Sometimes a grammar can generate the same string in several different ways! • This string will have several parse trees • This is a very serious problem • Think if a C program can have multiple interpretations? • If a language has this problem, we say that it is ambiguous
Example • Consider G5: <EXPR><EXPR>+<EXPR>|<EXPR>x<EXPR> |(<EXPR>) | a G5 is ambiguous because a+axa has two parse tress!
Example • Consider G5: <EXPR><EXPR>+<EXPR>|<EXPR>x<EXPR> |(<EXPR>) | a G5 is ambiguous because a+axa has two parse tress! <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> a + a x a
Example • Consider G5: <EXPR><EXPR>+<EXPR>|<EXPR>x<EXPR> |(<EXPR>) | a G5 is ambiguous because a+axa has two parse tress! <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> a + a x a a + a x a
Formal definition: ambiguity • A string w is generated ambiguously in CFG G if it has two or more different leftmost derivations! • A derivation is leftmost if at every step the variable being replaced is the leftmost one • Grammar G is ambiguous if it generates some string ambiguously
Chomsky Normal Form (CNF) • Every rule has the form • A BC • A a • S ε • Where S is the start symbol, A, B, and C are any variables – except that B and C may not be the start symbol
Theorem • Theorem: Any context-free language is generated by a context-free grammar in Chomsky normal form • How? • Add new start symbol S0 • Eliminate all rules of the form A ε • Eliminate all “unit” rules of the form A B • Patch up rules so that grammar denotes the same language • Convert remaining rules to proper form
Steps to convert any grammar into CNF • Step1 • Add a new start symbol S0 • Add the rule S0S
Steps to convert any grammar into CNF • Step2: Repeat • Remove some rule of the form A ε where A is not the start symbol • Then, for each occurrence of A on the right-hand side of a rule, we add a new rule with that occurrence deleted • E.g., if R uAvAu where u and v are strings of variables and terminals • We add rules: R uvAu, RuAvu, and Ruvu • For RA add Rε, except if Rε has already been removed • Until all ε-rules not involving the start symbol have been removed
Steps to convert any grammar into CNF • Step3: eliminate unit rules • Repeat • Remove some rule of the form A B • For each Bu, add Au, except if Au has already been removed • Until all unit rules have been removed
Steps to convert any grammar into CNF • Step4: convert remaining rules • Replace each rule A u1u2…uk, where k >2 and each ui is a terminal or a variable with the rules • Au1A1 • A1u2A2 • A2u3A3 • …. • Ak-2uk-1uk • If k=2, we replace any terminal ui in the preceding rules with the new variable Ui and add the rule Uiui
Example • Start with • S ASA | aB • A B | S • B b | ε
Example • Step 1: add new start symbol and new rule • S0 S • S ASA | aB • A B | S • B b | ε
Example • Step 2: remove ε-rule B ε • S0 S • S ASA | aB | a • A B | S | ε • B b
Example • Step 2: remove ε-rule A ε • S0 S • S ASA | aB | a | SA | AS | S • A B | S • B b
Example • Step 3: remove unit rule S S • S0 S • S ASA | aB | a | SA | AS | S • A B | S • B b
Example • Step 3: remove unit rule S0 S • S0 S | ASA | aB | a | SA | AS • S ASA | aB | a | SA | AS • A B | S • B b
Example • Step 3: remove unit rule A B • S0 ASA | aB | a | SA | AS • S ASA | aB | a | SA | AS • A B | S | b • B b
Example • Step 3: remove unit rule A S • S0 ASA | aB | a | SA | AS • S ASA | aB | a | SA | AS • A S | b | ASA | aB | a | SA | AS • B b
Example • Step 3: remove unit rule A S • S0 ASA | aB | a | SA | AS • S ASA | aB | a | SA | AS • A b | ASA | aB | a | SA | AS • B b
Example • Step 4: convert remaining rules • S0 AA1|UB| a| SA | AS • S AA1|UB | a | SA | AS • A b | AA1 | UB | a | SA | AS • B b • Ua • A1SA
Pushdown automata • Pushdown automat (PDA) are like nondeterministic finite automat but have an extra component called a stack • Can push symbols onto the stack • Can pop them (read them back) later • Stack is potentially unbounded
input State control a a b a x y z stack
Formal Definition • A pushdown automaton is a 6-tuple (Q,∑,S, ξ,q0,F), where • Q is a finite set of states • ∑ is a finite set of symbols called the alphabet • S is the stack alphabet • ξ : Q x ∑ε x Sε P(Q x Sε) is the transition function • q0 Є Q is the start state • F ⊆ Q is the set of accept states or final states