600 likes | 630 Views
Analyzing Ambiguity of Context-Free Grammars. Claus Brabrand brabrand(at)itu.dk IT Uni. of Copenhagen. Robert Giegerich robert(at)TechFak.Uni-Bielefeld.de University of Bielefeld, Germany. Anders Møller amoeller(at)brics.dk DAIMI, University of Aarhus. Outline.
E N D
Analyzing Ambiguity of Context-Free Grammars Claus Brabrand brabrand(at)itu.dk IT Uni. of Copenhagen Robert Giegerich robert(at)TechFak.Uni-Bielefeld.de University of Bielefeld, Germany Anders Møller amoeller(at)brics.dk DAIMI, University of Aarhus
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Motivation (for CFG Ambiguity) 1 Programming Languages what the programmer intended STM : EXP ";" | "if" "(" EXP ")" STM | "if" "(" EXP ")" STM "else" STM | "while" "(" EXP ")" "do" STM EXP : EXP "*" TERM | EXP "/" TERM | TERM TERM : TERM "+" FACT | TERM "-" FACT | FACT FACT : CONST | VAR Unambiguous P int f() { if (b) if (c) f(); else y++; } G parser P' Ambiguous ... G Programminglanguage (CFG) ComputerScientist 2 Models of Real-World Physical Structures Ambiguous beneficial... P : "(" P ")" | "(" O ")" O : L P | P R | S P S | H L : "." L | "." R : "." R | "." S : "." S | "." H : "." H | "." "." "." M prediction of physical structure G parser Unambiguous lethal... M' G ACGAT… physical structure model (CFG) Engineer
Context-Free Grammar Ambiguity • However: Undecidable! • i.e., no one can decide this line: • However^2… • Ambiguity:*: multiple derivation trees? s s T T’ Ambiguity means there such that: = ? ambiguous unambiguous
However: Conservative Analysis! • Use conservative (over-)approximation: • “Yes!” “G guaranteed unambiguous!” • Safely use any GLR parser on G ...and never get two parses at runtime! ...just because it’s undecidable, doesn’t mean there aren’t (good)conservative approximations! Indeed, the whole area of static analysis works on “side-steppingundecidability”. ambiguous unambiguous . G Yes!
Conservative Analysis (cont'd) • Undecidability means: “there’ll always be a slack”: • However, still useful! • Possible interpretations of “Don't know?”: • Treat as error(reject grammar): • “Please redesign your grammar” (as in LR(k)) • Treat as warning: • “Here are some potential problems” ambiguous . . unambiguous Don't know?
Problems with Existing Solutions Hard to reason (locally) about ambiguity: • Intricate overall structural property of a grammar Are "left-to-right" (or "right-to-left")biased: • Cannot handle "palindromic grammars" (...a serious problem for RNA analysis)! Error messages: • Hard to "pin-point ambiguity" (in terms of grammar) • Also: would like "shortest examples" for debugging (...especially for grammar non-experts)! 1 2 3 conflicts: 25 shift/reduce, 13 reduce/reduce
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Characterization of Ambiguity • Theorem 1 (characterization): G G G unambiguous "G is horizontally and Vertically unambiguous" • Note: • Ambiguity fully characterized • Still undecidable (...of course) • Structural problem Finite number of linguistic problems
EXP : ID | EXP '+' EXP | EXP '*' EXP Terminology:Context-Free Grammar NN • N finite set of nonterminals • finite set of terminals • s Nstart nonterminal • : N P(E*) production function, E = N G = N, , s, Assume (trivially): • Reachability (all nN reachable from s) • Productivity (all nN derive some string) L: E* P(*)"language-of" operator, L(s)
Vertical Unambiguity • “Vertical unambiguity”: • Example ("xy"): G n N : , ' (n) : ' L() L(') = S : 'x' Y | X'y' Y : 'y' X : 'x' Vertically ambiguous string: xy
X Y x a y X Y Horizontal Unambiguity • “Horizontal unambiguity”: where: is given by: • Example ("xay"): G n N: (n): = lr L(l) L(r) = : P(*) P(*) P(*) "overlap" XY:= { xay | x,y* a+ x,xaL(X) y,ayL(Y) } S : X Y V : 'x' | 'x''a' Y : 'a' 'y' | 'y' Horizontallly ambiguous string: xay
Characterization of Ambiguity • Theorem 1 (characterization): • Lemma 1a: (“”) • Lemma 1b: (“”) G G G unambiguous "G is horizontally and Vertically unambiguous" (aka. "soundness") G G G unambiguous (aka. "completeness") G G G unambiguous
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
(Over-)Approximation (A) • (Over-)Approximation, A L: • Approximated vertical unambiguity: • Approximated horizontal unambiguity: • Adecidable emptiness of “ ” and “ ” decidable (on co-dom(A )) L: E* P(*) E* :L() A() A: E* P(*) G A n N : , ' (n) : ' A() A(') = G A n N: (n) : = l r A(l) A(r) =
Unambiguity Approximation • Proposition 2 (approximation soundness): • Proof: • "Larger sets don't overlap smaller sets don't overlap"(contrapositively: "Smaller sets conflict Larger sets conflict"): G unambiguous G G A A and hence by transitivity via (Theorem 1) G G G G A A A() A(') = L() L(') = A(l) A(r) = L(l) L(r) =
Compositionality (of A's) • Proposition 3 (compositionality): • Proof: • Follows from definition [proof omitted] • Also:“approximations are locally(!)compositional” A,A’decidable (over-)approximations AA’ decidable (over-)approximation A ambiguous AA’ unambiguous ambiguous unambiguous ambiguous unambiguous A’
Are there any Approximations!?! • Are there any approximations?!? • YES!; e.g., "The worst... ...approximation" • A*() :=*everything(constant) • Almost useless: • “Can only acquit totally trivial grammars: as unambiguous” but safe(!) ambiguous unambiguous worst approximation N : 'x'
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Regular Approximation (AMN)! • AMN() = [Mohri-Nederhof]G() • CFG REGDFA(Over-)Approximation • Properties of this “ ”: • Good (over-)approximation! • Produces regular languages: • almost everything is decidable (constructively, via automata)! • Note: • Works on a language-level, L(G), ... • ...not onthe structure-levelof the grammar, G • “Regular Approximation of Context-Free Grammars through Transformation” • [Mohri-Nederhof, 2000] Black-box
Example: Odd/Even • Keeping track of parity (odd/even): Start : Even ; | Odd ; Even : "(" "(" Even ")" ")" ; | ; Odd : "(" "(" Odd ")" ")" ; | "(" ")" ; unambiguous grammar! L(Even) = { (2n )2n | n0 } L(Odd) = { (2n+1 )2n+1 | n0 } A(Even) = A(Odd) = { (2n+1 )2m+1| n,m0 } { (2n )2m | n,m0 }
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Assessment (implementation) • Java implementation: • 7,400 lines of code (command line + GU interface) [ www.brics.dk/grammar/ ]
Technology Transfer • Integrated in DotVocal's "Grammar Studio": Ambiguity analysis: Grammar Studio provides developers a powerful algorithm to test the vertical and horizontal ambiguities. Erasing any ambiguity in a grammar means to improve the effectiveness and by consequence the recognition too.
Examples: Palindromesand "Anti-palindromes" • Palindromic examples: P : "a" P "a" ; | ; P : "a" P "a" ; | "b" P "b" ; | "b" ; | "a" ; | ; P : "a" P "a" ; | "a" ; | ; unambiguous grammar! unambiguous grammar! unambiguous grammar! R : "a" R "b" ; | "b" R "a" ; | "a" "b" ; | "b" "a" ; R : "a" R "b" ; | "b" R "a" ; | ; Note: all are non-LR-Regular grammars!! unambiguous grammar! unambiguous grammar!
...inherent in RNA Analysis!!! "Predicting behavior of genes": "Complimentary base pairs" // 'G-C', 'A-U', and 'G-U': R : 'G'R'C' | 'C'R'G' | 'A'R'U' | 'U'R'A' | 'G'R'U' | 'U'R'G' |
Examples: RNA Analysis (G1) • RNA Analysis (G1): %> java –jar Grambiguity.jar G1.cfg *** vertical ambiguity detected: 'S[aS]' vs. 'S[Sa]' ambiguous string: "." *** vertical ambiguity detected: 'S[aa]' vs. 'S[SS]' ambiguous string: "()" *** vertical ambiguity detected: 'S[aS]' vs. 'S[SS]' ambiguous string: "." *** vertical ambiguity detected: 'S[Sa]' vs. 'S[SS]' ambiguous string: "." *** vertical ambiguity detected: 'S[SS]' vs. 'S[empty]' ambiguous string: "" *** horizontal ambiguity detected: 'S[SS:0..0]' vs. 'S[SS:1..1]' ambiguous string: "." *** ambiguous grammar: 5 vertical ambiguities 1 horizontal ambiguity /* ambiguous */ S[aa] : "(" S ")" ; [aS] | "." S ; [Sa] | S "." ; [SS] | S S ; [empty] | ; G1
Examples: RNA Analysis (G2) • RNA Analysis (G2): *** vertical ambiguity detected: 'S[aS]' vs. 'S[Sa]' ambiguous string: "." *** vertical ambiguity detected: 'S[aPa]' vs. 'S[SS]' ambiguous string: "()" *** vertical ambiguity detected: 'S[aS]' vs. 'S[SS]' ambiguous string: "." *** vertical ambiguity detected: 'S[Sa]' vs. 'S[SS]' ambiguous string: "." *** vertical ambiguity detected: 'S[SS]' vs. 'S[empty]' ambiguous string: "" *** vertical ambiguity detected: 'P[aPa]' vs. 'P[S]' ambiguous string: "()" *** horizontal ambiguity detected: 'S[SS:0..0]' vs. 'S[SS:1..1]' ambiguous string: "." *** ambiguous grammar: 6 vertical ambiguities 1 horizontal ambiguity /* ambiguous */ S[aPa] : "(" P ")" ; [aS] | "." S ; [Sa] | S "." ; [SS] | S S ; [empty] | ; P[aPa] : "(" P ")" ; [S] | S ; G2
Examples: RNA Analysis (G3-G6) • RNA Analysis (G3,G4,G5,G6): S[aS] : "." S ; [T] | T ; [empty] | ; T[Ta] : T "." ; [aSa] | "(" S ")" ; [TaSa] | T "(" S ")" ; G4 S[aPa] : "(" P ")" ; [aL] | "." L ; [Ra] | R "." ; [LS] | L S ; L[aPa] : "(" P ")" ; [aL] | "." L ; R[Ra] : R "." ; [empty] | ; P[aPa] : "(" P ")" ; [aNa] | "(" N ")" ; N[aL] : "." L ; [Ra] | R "." ; [LS] | L S ; G3 S[LS] : L S ; [L] | L ; L[aFa] : "(" F ")" ; [a] | "." ; F[aFa] : "(" F ")" ; [LS] | L S ; G6 unambiguous grammar! S[aS] : "." S ; [aSaS] | "(" S ")" S ; [empty] | ; G5
Examples: RNA Analysis (G7+G8) • RNA Analysis (G7,G8): S[aPa] : "(" P ")" ; [aL] | "." L ; [Ra] | R "." ; [LS] | L S ; L[aPa] : "(" P ")" ; [aL] | "." L ; R[Ra] : R "." ; [empty] | ; P[aPa] : "(" P ")" ; [aNa] | "(" N ")" ; N[aL] : "." L ; [Ra] | R "." ; [LS] | L S ; G7 *** (potential) vertical ambiguity detected: 'P[aPa]' vs. 'P[aNa]' shortest ambiguous string: "(((.)" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 0 (potential) horizontal ambiguities Note:these are all spurious errors due to imprecisions in the analysis S[aS] : "." S ; [T] | T ; [empty] | ; T[Ta] : T "." ; [aPa] | "(" P ")" ; [TaPa] | T "(" P ")" ; P[aPa] : "(" P ")" ; [aNa] | "(" N ")" ; N[aS] : "." S ; [Ta] | T "." ; [TaPa] | T "(" P ")" ; G8 Acquitted as unambiguoususing unfolding technique! *** (potential) vertical ambiguity detected: 'P[aPa]' vs. 'P[aNa]' shortest ambiguous string: "(((.)" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 0 (potential) horizontal ambiguities
Examples: "voss" & "voss-light" LR(k): LR(1) = 3 r/r conflicts LR(3) = 12 r/r conflicts LR(5) = 93 r/r conflicts LR(7) = 249 r/r conflicts LR(9) = 513 r/r conflicts ... P : "(" P ")" ; // P: Closed structure | "(" O ")" ; O : L P ; // O: Open structure | P R ; | S P S ; | H ; L : "." L ; // L: Left bulge | "." ; R : "." R ; // R: Right bulge | "." ; S : "." S ; // S: Singlestrand | "." ; H : "." H ; // H: Hairpin 3+loop | "." "." "." ; unambiguous grammar!
Example: Java Expressions /* -- cont'd -- */ Exp5[add] : Exp5 "+" Exp6 ; [sub] | Exp5 "-" Exp6 ; [exp6] | Exp6 ; Exp6[mul] : Exp6 "*" Exp7 ; [div] | Exp6 "/" Exp7 ; [exp7] | Exp7 ; Exp7[not] : "!" Exp7 ; [exp8] | Exp8 ; Exp8[par] : "(" Exp ")" ; [con] | Con ; Con[num] : "0" ; [id] | "x" ; Exp[assign] : Exp1 "=" Exp ; [exp1] | Exp1 ; Exp1[or] : Exp1 "||" Exp2 ; [exp2] | Exp2 ; Exp2[and] : Exp2 "&&" Exp3 ; [exp3] | Exp3 ; Exp3[eq] : Exp3 "==" Exp4 ; [neq] | Exp3 "!=" Exp4 ; [exp4] | Exp4 ; Exp4[lt] : Exp4 "<" Exp5 ; [leq] | Exp4 "<=" Exp5 ; [gt] | Exp4 ">" Exp5 ; [geq] | Exp4 ">=" Exp5 ; [exp5] | Exp5 ; unambiguous grammar!
Error Messages (Amb. Example) • Ambiguous Expressions: E[plus] : E "+" E ; [mult] | E "*" E ; [x] | "x" ; precedence "+" vs. "*" *** vertical ambiguity detected: 'E[plus]' vs. 'E[mult]' ambiguous string: ”x*x+x” *** horizontal ambiguity detected: 'E[plus:0..0]' vs. 'E[plus:1..2]' ambiguous string: ”x+x+x” *** horizontal ambiguity detected: 'E[plus:0..1]' vs. 'E[plus:2..2]' ambiguous string: ”x+x+x” *** horizontal ambiguity detected: 'E[mult:0..0]' vs. 'E[mult:1..2]' ambiguous string: ”x*x*x” *** horizontal ambiguity detected: 'E[mult:0..1]' vs. 'E[mult:2..2]' ambiguous string: ”x*x*x” *** ambiguous grammar: 1 vertical ambiguity 4 horizontal ambiguities assoc. of "+" assoc. of "*"
Benchmark Grammars UNAMBIGUOUS LR(k) .. LR(8) LR(7) LR(6) LR(5) LR(4) LR(3) G1 LR(2) (5V+1H) LR(1) LALR(1) G8 Exp Amb-Exp G4 O/E (1V+4H) G6 G5 G2 (6V+1H) P Base Voss R G7 Voss-light G3 [OUR] AMBIGUOUS
Benchmarks (from Schmitz 2007) Unambiguous
Benchmarks (from Schmitz 2007) Ambiguous
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Related Work (Dynamic) • Dynamicdisambiguation: • “Disambiguation-by-convention”: • Longest match, most specific match, … • Customizable: • [Bison v. 1.5+]: %dprec, %merge • [ASF+SDF]: “disambiguation filters” • Dynamicambiguityinterception: • GLR ([Tomita], [Early], [Bison], [ASF+SDF], …) • [AMBER]
Related Work (Static) • Staticdisambiguation: • “Disambiguation-by-convention”: • First match, most specific match, … • Customizable: • [Yacc]: %left, %right, %nonassoc, %prec • Staticambiguityinterception: • Our work goes here • LL(k), LALR(1), LR(k), LR-regular, … • Sylvain Schmitz (ICALP 2007): "Conservative Ambiguity Detection in Context-Free Grammars" "An Experimental Ambiguity Detection Tool" (LDTA 2007) • Subsumes LR-regular, Incomparable to our technique S : A A A : 'a' A 'a' | 'b'
Comparative Related Work • "Ambiguity Detection Methods for Context-Free Grammars" • H. J. S. Bas Basten (Master's thesis) • CWI, Universiteit van Amsterdam, Holland • "Ambiguity Detection for Context-Free Grammars in Eli" • Michael Kruse (Master's thesis) • Uni. Paderborn, Germany
Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion
Conclusion • Advantages (of our approach): • Characterization! • Possible to reason (locally) about ambiguity • (Composable) Analysis Framework • Complete decision procedure for regular grammars • Inherently parallelizable • DFA Counterexamples: • and shortest (possibly) ambiguous string • Not "left-to-right" or "right-to-left" biased: • Can handle palindromic grammars • Well-suited for RNA analysis :)
Conclusion (cont'd) “Analyzing Ambiguity of Context-Free Grammars” It has been known since 1962 that the ambiguity problem for context-free grammars is undecidable. Ambiguity in context-free grammars is a recurring problem in language design and parser generation, as well as in applications where grammars are used as models of real-world physical structures. However, the fact that the problem is undecidable does not mean that there are no useful approximations to the problem. We observe that there is a simple linguistic characterization of the grammar ambiguity problem, and we show how to exploit this to conservatively approximate the problem based on local regular approximations and grammar unfoldings. As an application, we consider grammars that occur in RNA analysis in bioinformatics, and we demonstrate that our static analysis of context-free grammars is sufficiently precise and efficient to be practically useful.
Thank you Questions, please?
Other Approximation Strategies • The ”EmptyString” Approximation: • The ”MayMust” Approximation: • …
Asymptotic (Time) Complexity h • [Mohri-Nederhof]: O(n2vh) • Vertical Amb: O(n3v4h4) • Horizontal Amb: O(n3v3h5) • Total: O(n3v3h4(v+h)) O(g5) N1: e1,1 … ea,1 | … | e1,p … ea,p • n = |N| • v = max {|(N)|, NN} • h = max {||, (N), NN} • g = nvh = |G| v n
AMNis Decidable! • . • Constructively decidable (using DFAs): • O(|XDFA||YDFA|) • Constructively decidable(using DFAs): • O(|XDFA||YDFA|) • Constructively decidable • with potential counterexamples(as DFAs);i.e., we can extract shortest (potentially ambiguous) strings! XY = XY = AMN AMN
X Y x a y X Y Decision Algorithm for (X Y) • For X,Y regular languages (NFAs): • All overlappings,“xay” (as DFA's) • (essentially a variant of "DFA product-construction", '') a a x y XNFA YNFA X'NFA Y'NFA [X;Y]NFA a a path : a
Example: Expressions • Expressions: Note: General problem with non-linear recursive structures However, there's a trick... E[term] : T ; [plus] | E "+" T ; T[x] : "x" ; [par] | "(" E ")" ; *** (potential) vertical ambiguity detected: 'E[term]' vs. 'E[plus]' shortest ambiguous string: "x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..0]' vs. 'E[plus:1..2]' shortest ambiguous string: "x+x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..1]' vs. 'E[plus:2..2]' shortest ambiguous string: "x+x+x" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 2 (potential) horizontal ambiguities