310 likes | 455 Views
CS 3240 – Chapter 5. Context-Free Languages. Where Are We?. Topics. 5.1: Context-Free Grammars Derivations Derivation Trees 5.2: Parsing and Ambiguity 5.3: CFGs and Programming Languages Precedence Associativity Expression Trees. A Curious Grammar. S ➞ aaSa | λ
E N D
CS 3240 – Chapter 5 Context-Free Languages
Where Are We? CS 3240 - Introduction
Topics • 5.1: Context-Free Grammars • Derivations • Derivation Trees • 5.2: Parsing and Ambiguity • 5.3: CFGs and Programming Languages • Precedence • Associativity • Expression Trees CS 3240 - Context-Free Languages
A Curious Grammar • S ➞ aaSa | λ • It is not right-linear or left-linear • so it is not a “regular grammar” • But it is linear • only one variable • What is it’s language? CS 3240 - Context-Free Languages
A Grammar for anbn S ➝ aSb | λ Deriving aaabbb: S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaabbb CS 3240 - Context-Free Languages
Context-Free Grammars • Variables • aka “non-terminals” • Letters from some alphabet, Σ • aka “terminals” • Rules (“substitution rules”) • of the form V → s • where s is any string of letters and variables, or λ • Rules are often called productions CS 3240 - Context-Free Languages
Sample CFGs • ancbn • anb2n • anbm, where 0 ≤ n ≤ m ≤ 2n • anbm, n ≠ m • Palindrome (start with a recursive definition) • Non-Palindrome • Equal • anbnam CS 3240 - Context-Free Languages
A Grammar for Twicenb(w) = 2⋅na(w) S → aSbSbS | bSaSbS | bSbSaS | λ Trace ababbb When building CFGs, remember that the start variable (S) represents a string in the language. So, for example, if S has twice as many b’s as a’s, then so does aSbSbS, etc. CS 3240 - Pushdown Automata
Derivations • A derivation is a sequence of applications of grammatical rules, eventually yielding a string in the language • A CFG can have multiple variables on the right-hand side of a rule • Giving a choice of which variable to expand first • By convention, we usually use a leftmost derivation CS 3240 - Context-Free Languages
A Leftmost Derivation <S> → <NP> <VP> <NP> → the <N> <VP> → <V> <NP> <V> → sings | eats <N> → cat | song | canary <S> ⇒ <NP> <VP> ⇒ the <N> <VP> ⇒ the canary <VP> ⇒ the canary <V> <NP> ⇒ the canary sings <NP> ⇒ the canary sings the <N> ⇒ the canary sings the song “sentential forms” (aka “productions”) CS 3240 - Context-Free Languages
Derivation Treesaka “Parse Trees” • A graphical representation of a derivation • The start symbol is the root • Each symbol in the right-hand side of the rule is a child node at the same level • Continue until the leaves are all terminals CS 3240 - Context-Free Languages
A Derivation Tree CS 3240 - Context-Free Languages
AmbiguitySection 5.2 • Note how there was only one parse tree or the string “the canary sings the song” • And only one leftmost derivation • This is not true of all grammars! • Some grammars allow choices of distinct rules to generate the same string • Or equivalently, where there is more than one parse tree for the same string • Such a grammar is ambiguous • Not easy to process programmatically CS 3240 - Context-Free Languages
An Ambiguous GrammarDerivation Perspective <exp> → <exp> + <exp> | <exp> * <exp> | (<exp>) | a | b | c <exp> ⇒ <exp> + <exp> ⇒ a + <exp> ⇒ a + <exp> * <exp> ⇒ a + b * <exp> ⇒ a + b * c <exp> ⇒ <exp> * <exp> ⇒ <exp> + <exp> * <exp> ⇒ a + <exp> * <exp ⇒ a + b * <exp> ⇒ a + b * c CS 3240 - Context-Free Languages
An Ambiguous GrammarParse Tree Perspective Which one is “correct”? CS 3240 - Context-Free Languages
Parsing • The process of determining if a string is generated by a grammar • And often we want the parse tree • So that we know the order of operations • Top-down Parsing • Easiest conceptually • Bottom-up Parsing • Most efficient (used by commercial compilers) • We will use a simple one in Chapter 6 CS 3240 - Context-Free Languages
Top-Down Parsing • Try to match a string, w, to a grammar • If there is a rule S → w, we’re done! • Fat chance :-) • Try to find rules that match the first character • A “look-ahead” strategy • This is what we do “in our heads” anyway • Repeat on the rest of the string… • Very “brute force” CS 3240 - Context-Free Languages
Top-Down ParsingExample S → SS | aSb | bSa | λ Parse “aabb”: CS 3240 - Context-Free Languages
Top-Down ParsingExample • S → SS | aSb | bSa | λ • Parse “aabb”: • Candidate rules: 1) S → SS, 2) S → aSb: • SS ⇒ SSS, SS ⇒ aSbS • aSb ⇒ aSSb, aSb ⇒ aaSbb • Answer: S ⇒ aSb ⇒ aaSbb ⇒ aabb (2) • Not a well-defined algorithm (yet)! CS 3240 - Context-Free Languages
Parsing by Recursive Descent • A top-down parsing technique • Grammar Requirements: • no ambiguity • no lambdas • no left-recursion (e.g., A -> Ab) • … and some other stuff • Create a function for each variable • Check first character to choose a rule • Start by calling S( ) CS 3240 - Context-Free Languages
Parsing anbn, n > 0, by Recursive Descent • Grammar: S -> aSb | ab • Function S: • if length == 2, check to see if it is “ab” • otherwise, consume outer‘a’ and ‘b’, then call S on what’s left • See parseanbn.py, parseanbn2.py CS 3240 - Context-Free Languages
Parsing b*a by Recursive Descent • Grammar: A -> BA | aB -> bB | b • See parsebstara.cpp CS 3240 - Context-Free Languages
The Problem with λ • Lambda rules can cause productions to shrink • Then they can grow, and shrink again • And grow, and shrink, and grow, and shrink… • How then can we know if the string isn’t in the language? • That is, how do we know when we’re done so we can stop and reject the string? CS 3240 - Context-Free Languages
Another Problem“Unit Production Rules” • A rule of the form A → B doesn’t increase the size of the sentential form • Once again, we could spend a long time cycling through unit rules before parsing |w| • We prefer a method that always strictly grows to |w|, so we can stop and answer “yes” or “no” efficiently • So, we will removelambda and unit rules • In Chapter 6 CS 3240 - Context-Free Languages
CFGs and Programming LanguagesSection 5.3 • Precedence • Associativity CS 3240 - Context-Free Languages
Fixing Our Expression GrammarPrecedence • It was ambiguous because it treated all operators equally • But multiplication should have higher precedence than addition • So we introduce a new variable for multiplicative expressions • And place it further down in the rules • Because we want it to appear further down in the parse tree CS 3240 - Context-Free Languages
Giving Precedence <exp> → <exp> + <mulexp> | <mulexp> <mulexp> → <mulexp> * <rootexp> | <rootexp> <rootexp> → (<exp>) | a | b | c Now only one leftmost derivation for a + b * c: <exp> ⇒<exp> + <mulexp> ⇒ <mulexp> + <mulexp> ⇒ <rootexp> + <mulexp> ⇒ a + <mulexp> ⇒ a + <mulexp> * <rootexp> ⇒ a + <rootexp> * <rootexp> ⇒ a + b * <rootexp> ⇒ a + b * c CS 3240 - Context-Free Languages
Giving Precedence CS 3240 - Context-Free Languages
Associativity • Derive the parse tree for a + b + c … • Note how you get (a + b) + c, in effect • Left-recursion gives left associativity • Analogously for right associativity • Exercise: • Add a right-associative power (exponentiation) operator (^, with variable <powerexp>) to the grammar with the proper precedence CS 3240 - Context-Free Languages