560 likes | 725 Views
Analyzing Ambiguity of Context-Free Grammars. Claus Brabrand brabrand(at)brics.dk DAIMI, University of Aarhus. Robert Giegerich robert(at)TechFak.Uni-Bielefeld.de University of Bielefeld, Germany. Anders Møller amoeller(at)brics.dk DAIMI, University of Aarhus. Outline.
E N D
Analyzing Ambiguity of Context-Free Grammars Claus Brabrand brabrand(at)brics.dk DAIMI, University of Aarhus Robert Giegerich robert(at)TechFak.Uni-Bielefeld.de University of Bielefeld, Germany Anders Møller amoeller(at)brics.dk DAIMI, University of Aarhus
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Motivation (for CFG Ambiguity) 1 Programming Languages what the programmer intended STM : EXP ";" | "if" "(" EXP ")" STM | "if" "(" EXP ")" STM "else" STM | "while" "(" EXP ")" "do" STM EXP : EXP "*" TERM | EXP "/" TERM | TERM TERM : TERM "+" FACT | TERM "-" FACT | FACT FACT : CONST | VAR Unambiguous P int f() { if (b) if (c) f(); else y++; } G parser P' Ambiguous ... G programming language (CFG) ComputerScientist 2 Models of Real-World Physical Structures Ambiguous beneficial... P : "(" P ")" | "(" O ")" O : L P | P R | S P S | H L : "." L | "." R : "." R | "." S : "." S | "." H : "." H | "." "." "." M prediction of physical structure AACGGAGCGGTGGCATCGGAT CGACTTT G parser Unambiguous lethal... M' G physical structure model (CFG) Engineer
Context-Free Grammar Ambiguity • However: Undecidable! • i.e., no one can decide this line: • However^2… • Ambiguity:*: multiple derivation trees? s s T T’ Ambiguity means there such that: = ? ambiguous unambiguous
However: Conservative Analysis! • Use conservative (over-)approximation: • “Yes!” “G guaranteed unambiguous!” • Safely use any GLR parser on G ...and never get two parses at runtime! ...just because it’s undecidable, doesn’t mean there aren’t (good)conservative approximations! Indeed, the whole area of static analysis works on “side-steppingundecidability”. ambiguous unambiguous . G Yes!
Conservative Analysis (cont'd) • Undecidability means: “there’ll always be a slack”: • However, still useful! • Possible interpretations of “Don't know?”: • Treat as error(reject grammar): • “Please redesign your grammar” (as in LR(k)) • Treat as warning: • “Here are some potential problems” ambiguous . . unambiguous Don't know?
Problems with Existing Solutions Hard to reason (locally) about ambiguity: • Intricate structural property of a grammar Are "left-to-right" (or "right-to-left")biased: • Cannot handle "palindromic grammars" (...a serious problem for RNA analysis)! Error messages: • Hard to "pin-point ambiguity" (in terms of grammar) • Also: would like "shortest examples" for debugging (...especially for grammar non-experts)! 1 2 3 conflicts: 7 shift/reduce, 9 reduce/reduce
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
EXP : ID | EXP '+' EXP | EXP '*' EXP Terminology:Context-Free Grammar NN • N finite set of nonterminals • finite set of terminals • s Nstart nonterminal • : N P(E*) production function, E = N G = N, , s, Assume (trivially): • Reachability (all nN reachable from s) • Productivity (all nN derive some string) L: E* P(*)"language-of" operator, L(G)
Vertical Unambiguity • “Vertical unambiguity”: • Example ("xy"): G n N : , ' (n) : ' L() L(') = S : 'x' Y | X'y' Y : 'y' X : 'x' Vertically ambiguous string: xy ~ “reduce/reduce conflict” in [Yacc]
X Y x a y X Y Horizontal Unambiguity • “Horizontal unambiguity”: where: is given by: • Example ("xay"): G n N: (n): = lr L(l) L(r) = : P(*) P(*) P(*) "overlap" XY:= { xay | x,y* a+ x,xaL(X) y,ayL(Y) } S : 'x'V W V : 'a' | W : 'a' 'y' | 'y' Horizontallly ambiguous string: xay ~ “shift/reduce conflict” in [Yacc]
Characterization of Ambiguity • Theorem 1 (characterization): • Lemma 1a: (“”) • Lemma 1b: (“”) G G G unambiguous "G is vertically and horizontally unambiguous" (aka. "soundness") • Note: • Ambiguity fully characterized • Still undecidable (...of course) • Structural problem Finite number of linguistic problems G G G unambiguous (aka. "completeness") G G G unambiguous The proofs are in the Tech. Report (straightforward induction proofs)
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
(Over-)Approximation (A) • (Over-)Approximation, A: • Approximated vertical unambiguity: • Approximated horizontal unambiguity: • Adecidable emptiness of “ ” and “ ” decidable (on co-dom(A )) A: E* P(*) E* :L() A() L: E* P(*) G A n N : , ' (n) : A() A(') = G A n N: (n): = l r A(l) A(r) =
Unambiguity Approximation • Proposition 2 (approximation soundness): • Proof: • "Larger sets don't overlap smaller sets don't overlap"(equivalently: "Conflicts w/ smaller sets conflicts w/ larger sets"): G unambiguous G G A A and hence by transitivity via (Theorem 1) G G G G A A A() A(') = L() L(') = A(l) A(r) = L(l) L(r) =
Compositionality (of A's) • Proposition 3 (compositionality): • Proof: • Follows from definition [proof omitted] • Also:“approximations are locally(!)compositional” A,A’decidable (over-)approximations AA’ decidable (over-)approximation A ambiguous AA’ unambiguous ambiguous unambiguous ambiguous unambiguous A’
Are there any Approximations!?! • Are there any approximations?!? • YES!; e.g., "The worst... ...approximation" • A*() :=*everything(constant) • Almost useless: • “Can only acquit totally trivial grammars: as unambiguous” but safe(!) ambiguous unambiguous worst approximation N : 'x'
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Regular Approximation (AMN)! • AMN() = [Mohri-Nederhof]G() • CFG REGDFA(Over-)Approximation • Properties of this “ ”: • Good (over-)approximation! • Produces regular languages: • almost everything is decidable (constructively, via automata)! • Note: • Works on a language-level, L(G), ... • ...not onthe structure-levelof the grammar, G • “Regular Approximation of Context-Free Grammars through Transformation” • [Mohri-Nederhof, 2000] Black-box
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Assessment (implementation) • Java impl.: "grambiguity" (510 lines, using): • "dk.brics.automaton" [ http://www.brics.dk/automaton/ ] • "dk.brics.grammar" [ http://www.brics.dk/grammar/ ] • Java String Analyzer [ http://www.brics.dk/JSA/ ] /* unambiguous */ P[aPa] : "a" P "a" ; [a] | "a" ; [empty] | ; P *** (potential) vertical ambiguity detected: 'E[plus]' vs. 'E[mult]' shortest ambiguous string: "x*x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..0]' vs. 'E[plus:1..2]' shortest ambiguous string: "x+x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..1]' vs. 'E[plus:2..2]' shortest ambiguous string: "x+x+x" *** (potential) horizontal ambiguity detected: 'E[mult:0..0]' vs. 'E[mult:1..2]' shortest ambiguous string: "x*x*x" *** (potential) horizontal ambiguity detected: 'E[mult:0..1]' vs. 'E[mult:2..2]' shortest ambiguous string: "x*x*x" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 4 (potential) horizontal ambiguities unambiguous grammar! /* ambiguous */ E[plus] : E "+" E ; [mult] | E "*" E ; [x] | "x" ; E
Examples: Palindromesand "Anti-palindromes" • Palindromic examples: P : "a" P "a" ; | ; P : "a" P "a" ; | "b" P "b" ; | "b" ; | "a" ; | ; P : "a" P "a" ; | "a" ; | ; unambiguous grammar! unambiguous grammar! unambiguous grammar! R : "a" R "b" ; | "b" R "a" ; | "a" "b" ; | "b" "a" ; R : "a" R "b" ; | "b" R "a" ; | ; Note: all are non-LR-Regular grammars!! unambiguous grammar! unambiguous grammar!
...inherent in RNA Analysis!!! "Predicting behavior of genes": "Complimentary base pairs" // 'G-C', 'A-U', and 'G-U': R : 'G'R'C' | 'C'R'G' | 'A'R'U' | 'U'R'A' | 'G'R'U' | 'U'R'G' |
Examples: RNA Analysis (G1) • RNA Analysis (G1): %> java –jar Grambiguity.jar G1.cfg *** (potential) vertical ambiguity detected: 'S[aS]' vs. 'S[Sa]' shortest ambiguous string: "." *** (potential) vertical ambiguity detected: 'S[aa]' vs. 'S[SS]' shortest ambiguous string: "()" *** (potential) vertical ambiguity detected: 'S[aS]' vs. 'S[SS]' shortest ambiguous string: "." *** (potential) vertical ambiguity detected: 'S[Sa]' vs. 'S[SS]' shortest ambiguous string: "." *** (potential) vertical ambiguity detected: 'S[SS]' vs. 'S[empty]' shortest ambiguous string: "" *** (potential) horizontal ambiguity detected: 'S[SS:0..0]' vs. 'S[SS:1..1]' shortest ambiguous string: "." *** (potentially) ambiguous grammar: 5 (potential) vertical ambiguities 1 (potential) horizontal ambiguity /* ambiguous */ S[aa] : "(" S ")" ; [aS] | "." S ; [Sa] | S "." ; [SS] | S S ; [empty] | ; G1
Examples: RNA Analysis (G2) • RNA Analysis (G2): *** (potential) vertical ambiguity detected: 'S[aS]' vs. 'S[Sa]' shortest ambiguous string: "." *** (potential) vertical ambiguity detected: 'S[aPa]' vs. 'S[SS]' shortest ambiguous string: "()" *** (potential) vertical ambiguity detected: 'S[aS]' vs. 'S[SS]' shortest ambiguous string: "." *** (potential) vertical ambiguity detected: 'S[Sa]' vs. 'S[SS]' shortest ambiguous string: "." *** (potential) vertical ambiguity detected: 'S[SS]' vs. 'S[empty]' shortest ambiguous string: "" *** (potential) vertical ambiguity detected: 'P[aPa]' vs. 'P[S]' shortest ambiguous string: "()" *** (potential) horizontal ambiguity detected: 'S[SS:0..0]' vs. 'S[SS:1..1]' shortest ambiguous string: "." *** (potentially) ambiguous grammar: 6 (potential) vertical ambiguities 1 (potential) horizontal ambiguity /* ambiguous */ S[aPa] : "(" P ")" ; [aS] | "." S ; [Sa] | S "." ; [SS] | S S ; [empty] | ; P[aPa] : "(" P ")" ; [S] | S ; G2
Examples: RNA Analysis (G3-G6) • RNA Analysis (G3,G4,G5,G6): S[aS] : "." S ; [T] | T ; [empty] | ; T[Ta] : T "." ; [aSa] | "(" S ")" ; [TaSa] | T "(" S ")" ; G4 S[aPa] : "(" P ")" ; [aL] | "." L ; [Ra] | R "." ; [LS] | L S ; L[aPa] : "(" P ")" ; [aL] | "." L ; R[Ra] : R "." ; [empty] | ; P[aPa] : "(" P ")" ; [aNa] | "(" N ")" ; N[aL] : "." L ; [Ra] | R "." ; [LS] | L S ; G3 S[LS] : L S ; [L] | L ; L[aFa] : "(" F ")" ; [a] | "." ; F[aFa] : "(" F ")" ; [LS] | L S ; G6 unambiguous grammar! S[aS] : "." S ; [aSaS] | "(" S ")" S ; [empty] | ; G5 Similarly for 'G7' and 'G8' (using an unfolding trick)
Examples: "voss" & "voss-light" LR(k): LR(1) = 3 r/r conflicts LR(3) = 12 r/r conflicts LR(5) = 93 r/r conflicts LR(7) = 249 r/r conflicts LR(9) = 513 r/r conflicts ... P : "(" P ")" ; // P: Closed structure | "(" O ")" ; O : L P ; // O: Open structure | P R ; | S P S ; | H ; L : "." L ; // L: Left bulge | "." ; R : "." R ; // R: Right bulge | "." ; S : "." S ; // S: Singlestrand | "." ; H : "." H ; // H: Hairpin 3+loop | "." "." "." ; unambiguous grammar!
Example: Java Expressions /* -- cont'd -- */ Exp5[add] : Exp5 "+" Exp6 ; [sub] | Exp5 "-" Exp6 ; [exp6] | Exp6 ; Exp6[mul] : Exp6 "*" Exp7 ; [div] | Exp6 "/" Exp7 ; [exp7] | Exp7 ; Exp7[not] : "!" Exp7 ; [exp8] | Exp8 ; Exp8[par] : "(" Exp ")" ; [con] | Con ; Con[num] : "0" ; [id] | "x" ; Exp[assign] : Exp1 "=" Exp ; [exp1] | Exp1 ; Exp1[or] : Exp1 "||" Exp2 ; [exp2] | Exp2 ; Exp2[and] : Exp2 "&&" Exp3 ; [exp3] | Exp3 ; Exp3[eq] : Exp3 "==" Exp4 ; [neq] | Exp3 "!=" Exp4 ; [exp4] | Exp4 ; Exp4[lt] : Exp4 "<" Exp5 ; [leq] | Exp4 "<=" Exp5 ; [gt] | Exp4 ">" Exp5 ; [geq] | Exp4 ">=" Exp5 ; [exp5] | Exp5 ; unambiguous grammar!
Error Messages (Amb. Example) • Ambiguous Expressions: E[plus] : E "+" E ; [mult] | E "*" E ; [x] | "x" ; precedence "+" vs. "*" *** (potential) vertical ambiguity detected: 'E[plus]' vs. 'E[mult]' shortest ambiguous string: "x*x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..0]' vs. 'E[plus:1..2]' shortest ambiguous string: "x+x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..1]' vs. 'E[plus:2..2]' shortest ambiguous string: "x+x+x" *** (potential) horizontal ambiguity detected: 'E[mult:0..0]' vs. 'E[mult:1..2]' shortest ambiguous string: "x*x*x" *** (potential) horizontal ambiguity detected: 'E[mult:0..1]' vs. 'E[mult:2..2]' shortest ambiguous string: "x*x*x" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 4 (potential) horizontal ambiguities assoc. of "+" assoc. of "*"
Benchmark Grammars UNAMBIGUOUS LR(k) .. LR(8) LR(7) LR(6) LR(5) LR(4) LR(3) G1 LR(2) (5V+1H) LR(1) LALR(1) G8 Exp Amb-Exp G4 O/E (1V+4H) G6 G5 G2 (6V+1H) P Base Voss R G7 Voss-light G3 [OUR] AMBIGUOUS
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Related Work (Dynamic) • Dynamicdisambiguation: • “Disambiguation-by-convention”: • Longest match, most specific match, … • Customizable: • [Bison v. 1.5+]: %dprec, %merge • [ASF+SDF]: “disambiguation filters” • Dynamicambiguityinterception: • GLR ([Tomita], [Early], [Bison], [ASF+SDF], …)
Related Work (Static) • Staticdisambiguation: • “Disambiguation-by-convention”: • First match, most specific match, … • Customizable: • [Yacc]: %left, %right, %nonassoc, %prec • Staticambiguityinterception: • Our work goes here • LL(k), LALR(1), LR(k), LR-regular, … • Sylvain Schmitz (ICALP 2007): "Conservative Ambiguity Detection in Context-Free Grammars" • Subsumes LR-regular • Incomparable to our technique S : A A A : 'a' A 'a' | 'b'
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Conclusion • Advantages (of our approach): • Characterization! • Possible to reason (locally) about ambiguity • (Composable) Analysis Framework • Complete decision procedure for regular grammars • Inherently parallelizable • DFA Counterexamples: • and shortest (possibly) ambiguous string • Not "left-to-right" or "right-to-left" biased: • Can handle palindromic grammars • Well-suited for RNA analysis :)
Conclusion (cont'd) “Analyzing Ambiguity of Context-Free Grammars” It has been known since 1962 that the ambiguity problem for context-free grammars is undecidable. Ambiguity in context-free grammars is a recurring problem in language design and parser generation, as well as in applications where grammars are used as models of real-world physical structures. However, the fact that the problem is undecidable does not mean that there are no useful approximations to the problem. We observe that there is a simple linguistic characterization of the grammar ambiguity problem, and we show how to exploit this to conservatively approximate the problem based on local regular approximations and grammar unfoldings. As an application, we consider grammars that occur in RNA analysis in bioinformatics, and we demonstrate that our static analysis of context-free grammars is sufficiently precise and efficient to be practically useful.
Thank you Questions, please?
Advertisement • Film about teaching/learning: • based on educational research theories: • Freely available on google video + on DVD (subtitles in 7 languages) • Used on all continents for teaching teachers about teaching and learning • 3,500+ DVDs (non-profit) sold in a few months • 17,000+ online views • Features epilogue by Prof. John Biggs [ http://www.daimi.au.dk/~brabrand/short-film/ ]
Asymptotic (Time) Complexity h • [Mohri-Nederhof]: O(n2vh) • Vertical Amb: O(n3v4h4) • Horizontal Amb: O(n3v3h5) • Total: O(n3v3h4(v+h)) O(g5) N1: e1,1 … ea,1 | … | e1,p … ea,p • n = |N| • v = max {|(N)|, NN} • h = max {||, (N), NN} • g = nvh = |G| v n
Other (cheaper) approximations • Use cheaper approximations first: • e.g.: < F , M , L > set of first chars set of middle chars set of last chars
Example: Odd/Even • Keeping track of parity (odd/even): Start : Even ; | Odd ; Even : "(" "(" Even ")" ")" ; | ; Odd : "(" "(" Odd ")" ")" ; | "(" ")" ; unambiguous grammar! L(Even) = { (2n )2n | n0 } L(Odd) = { (2n+1 )2n+1 | n0 } A(Even) = A(Odd) = { (2n+1 )2m+1| n,m0 } { (2n )2m | n,m0 }
AMNis Decidable! • . • Constructively decidable (using DFAs): • O(|XDFA||YDFA|) • Constructively decidable(using DFAs): • O(|XDFA||YDFA|) • Constructively decidable • with potential counterexamples(as DFAs);i.e., we can extract shortest (potentially ambiguous) strings! XY = XY = AMN AMN
X Y x a y X Y Decision Algorithm for (X Y) • For X,Y regular languages (NFAs): • All overlappings,“xay” (as DFA's) • (essentially a variant of "DFA product-construction", '') a a x y XNFA YNFA X'NFA Y'NFA [X;Y]NFA a a path : a
Examples: RNA Analysis (G7) • RNA Analysis (G7,G8): S[aPa] : "(" P ")" ; [aL] | "." L ; [Ra] | R "." ; [LS] | L S ; L[aPa] : "(" P ")" ; [aL] | "." L ; R[Ra] : R "." ; [empty] | ; P[aPa] : "(" P ")" ; [aNa] | "(" N ")" ; N[aL] : "." L ; [Ra] | R "." ; [LS] | L S ; G7 *** (potential) vertical ambiguity detected: 'P[aPa]' vs. 'P[aNa]' shortest ambiguous string: "(((.)" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 0 (potential) horizontal ambiguities S[aS] : "." S ; [T] | T ; [empty] | ; T[Ta] : T "." ; [aPa] | "(" P ")" ; [TaPa] | T "(" P ")" ; P[aPa] : "(" P ")" ; [aNa] | "(" N ")" ; N[aS] : "." S ; [Ta] | T "." ; [TaPa] | T "(" P ")" ; G8 Note:these are all spurious errors due to imprecisions in the analysis *** (potential) vertical ambiguity detected: 'P[aPa]' vs. 'P[aNa]' shortest ambiguous string: "(((.)" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 0 (potential) horizontal ambiguities
Example: Expressions • Expressions: Note: General problem with non-linear recursive structures However, there's a trick... E[term] : T ; [plus] | E "+" T ; T[x] : "x" ; [par] | "(" E ")" ; *** (potential) vertical ambiguity detected: 'E[term]' vs. 'E[plus]' shortest ambiguous string: "x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..0]' vs. 'E[plus:1..2]' shortest ambiguous string: "x+x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..1]' vs. 'E[plus:2..2]' shortest ambiguous string: "x+x+x" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 2 (potential) horizontal ambiguities
Examples: Expressions (cont'd) • Expressions: unfold trick: (inside/outside) parentheses E: T : E "+" T T: "x" : "(" E ")" E E[term] : T ; [plus] | E "+" T ; T[x] : "x" ; [par] | "(" E ")" ; AST = x+(x+(x+x)+x)+x G G Gu E: T : E "+" T T: "x" : "(" E ")" E : T : E "+" T T : "x" : "(" E ")" unfold wrt. '(' and ')' Gu E[term] : T ; [plus] | E "+" T ; T[x] : "x" ; [par] | "(" E ")" ; E[term] : T ; [plus] | E "+" T ; T[x] : "x" ; [par] | "(" E ")" ; E ASTu u = x+(x+(x+x)+x)+x unambiguous grammar!
Conservative Analysis (cont'd) • Undecidability means: “there’ll always be a slack”: • However, still useful! • Possible interpretations of “Don't know?”: • Treat as error(reject grammar): • “Please redesign your grammar” (as in LR(k)) • Treat as warning: • “Here are some potential problems” ambiguous . . unambiguous Don't know?
Proof (Lemma 1a): “” • Lemma 1a: • …contrapositively: • Proof structure: • Assume G ambiguous (i.e. 2 der. trees for ) • Show: • by induction in max height of the 2 derivation trees G G G unambiguous G ambiguous G G G G
Proof (Lemma 1a): “” (Base) • Base case (height 1): • The ambiguity means that: • However, this means that: = t0 t1 .. t||-1 = '(i.e. the two trees must be the same); and so the result holds vacuously N N 1 1 ’ =