260 likes | 401 Views
Automata and Logic C haracterization of Floyd Languages. Violetta Lonati DSI - Universit à degli Studi di Milano Dino Mandrioli DEI - Politecnico di Milano Matteo Pradella DEI - Politecnico di Milano. Rather unusual presentation. No outline at the beginning Only ….
E N D
Automata and Logic Characterization of Floyd Languages • Violetta Lonati DSI - Universitàdegli Studi di Milano • Dino Mandrioli DEI - Politecnico di Milano • Matteo Pradella DEI - Politecnico di Milano
Rather unusual presentation • No outline at the beginning • Only …
1. Short summary of Floyd languages and grammars(they are a little outdated …) • In 1963 R. Floyd introduced Operator Precedence Grammars, a subclass context-free grammars, with the goal of developing efficient parsing techniques. • OPGs –here named FGs after their inventor- are inspired by the structure of arithmetic expressions (and their operators)
The basics of Floyd Grammars (1) • operator form (normal for CF): • No adjacent nonterminals • precedences • balanced letters (A aBb, …) are equal in precedence (.=) • precedences between letters inspired by arithmetics’ precedences, e.g+ . • adjacent letter precedences determine syntax tree: • S A; A bAc | bc • . b . b =. c . c . • Reduction: . b .= c . A (reverse of A bc ) • . b A c . : b .= A c (nonterminals are “transparent”) • Reduction: . b .= A c . A (reverse of A bAc ) • Reduction: A S (reverse of S A )
The basics of Floyd Grammars (2) b c • G’ s (conflict free) Operator Precedence Matrix, OPM • b . L(A) • R(A) . c b . c . A b A c . .
The basics of Floyd Grammars (3) • G1 = {E → E + T | T, T → T × a | a} • G2 = {E → E + T | T, T → T × F | F, F → (E) | a} • G1’s precedences are: • a ⋗ +, a ⋗ × • + ⋗ +, + ⋖ × , + ⋖ a • × =˙ a • NB: implicitly: ⋖ , ⋗ E E + T E + . T T T x a . … . a . a
2. A question raised by a reviewer • “Why studying operator precedence languages now-a-days? just for fun??” • Certainly we (fun is subjective feeling) had and have fun while investigating FG properties (this should not be an exception at least within a TCS community …) • However, not just for fun:
2.1 FGs have been abandoned • Unlike more powerful classes (LR) they cannot generate all deterministic CF languages • (but this is more a theoretical than a practical weakness) • They were originally motivated by parsing, and new powerful parsing techniques emerged … though rarely they exhibited the simplicity and efficiency of FG-based ones.
2.2 A more recent and still quite alive and productive result: Model checking (MC) (Remark: Both FGs and MC contributed to granting a Turing award …) • What has MC to do with FGs? • MC is rooted in basic closure properties + decidability of the emptiness property • These properties are typically enjoyed by regular languages (finite state -FS) • MC exploits automata theoretic and logic (MSO) characterization of FS languages
2.3 A large amount of literature strove to extend the scope of MC beyond the limits of FS machines • The typical goal is to keep the properties that allow for the application of MC algorithms • Among the various attemptsVisibly Pushdown Languages (VPLs) have certainly been quite successful • VPLs generalize parenthesis languages: • { ( } = c , { ) } = r , VT = i • Calls (open parentheses) and returns (closed ones) are not necessarily matched: • Unmatched returns at the beginning of the string • Unmatched calls at the end (acceptance with non empty stack)
VPLs inherit main properties of regular languages: • Closed w.r.t. boolean operations • Closed w.r.t. concatenation, Kleene *, prefix, suffix, … • By keeping the partitioning of unaffected • With some “care” about reversal and homomorphism • Deterministic VPAs equivalent to nondeterministic ones … • With a typical power-set construction • MSO logic characterization • In summary: theyresume and extend the original work by McNaughton and others on tree automata.
2.4 Somewhat surprisingly …(at least for us) • VPLs are a proper subclass of FLs • Crespi-Reghizzi and Mandrioli (JCSS, 2012, # 6) • Precisely, they are all and only those FLs characterized by a • Partitioned Precedence matrix:
2.5 FLs also share the classical closure properties enjoyed by regular languages and VPLs • FLs closed w.r.t. (Crespi and Mandrioli, 1978 and … 2010) • Boolean operations • Concatenation and Kleene * (more difficult to prove than for other classes of languages) • Prefix and suffix • … • Thus they are perfect candidates to further extend MC techniques to infinite state machines
2.6 But studying FGs was abandoned a long time ago … • (Somewhat surprisingly) an automata family associated with (accepting all and only) FLs was still lacking • (Less suprisingly) a (MSO) logic characterization was also lacking: • Two important contributors to the power of MC • So: • Not just for fun • Incidentally: • FLs –unlike general deterministic languages – enjoy a local parsability property which enable parallel and incrementalparsing (Barenghi et al., SLE 2012), which “now-a-days” is probably more interesting than 40 years ago
3. Floyd automata (FAs) • The transition function can be seen as the union of two disjoint functions: • push: Q 2Q flush: Q Q 2Q • Push and mark moves both push the input symbol on the top of the stack, together with the new state computed by push; such moves differ only in the marking of the symbol on top of the stack. • The flush move is more complex: the symbols on the top of the stack are removed until the first marked symbol (included), and the state of the next symbol below them in the stack is updated by flush according to the pair of states that delimit the portion of the stack to be removed.
An () – language that can be modeled by a FA (but not by a VPA): • the stack management of a simple programming language that is able to handle nested exceptions: • two procedures, called a and b. Calls and returns are denoted by calla, callb, reta, retb, respectively. • During execution, it is possible to install an exception handler hnd. • rst is issued when an exception occur, or after a correct execution to uninstall the handler. With a rst the stack is “flushed”, restoring the state right before the last hnd.
Deterministic FAs are as powerful as nondeterministic ones • (as it happens for FSMs and VPAs) • proof is based on, but is not just a rephrasing of, the normal power-set construction …
4. The “traditional” MSO characterization • := a(x) | x X |x y | x y | x = x +1 | | | x. | X. • The only “novelty” w.r.t. the standard Buchi’s syntax is the ‘’ relation • Which somewhat resembles the ‘---->’ relation between two “matching positions” in VPLs.
4.1 Here comes review # 2 (plus others) • “The MSO characterization for a class of languages is an interesting result which adds to a theory, though it is often quite a standard exercise, as it seems to be the case also for FL” • (Fortunately, also: • “Overall, the results are interesting and can be accepted for presentation at ICTCS.” • ) • Side personal trouble: whyonly for MSO and not for FAs? ….
Indeed the basic –and most original- construction due to Buchi to build an automaton starting from a MSO formula has been adapted in the following literature to many other automata families, including tree-automata, VPA, … and works for FAs too, with a couple of non-trivial technical warnings due to the need of extending precedence relations when changing alphabets.
Indeed “coding” (FS) automata moves in terms of logic formulas is not-a-too-difficult exercise and has been repeated without serious obstacles for other automata (e.g. tree automata). • For VPAs the authors introduced the x ---> y relation between “matching positions” and built a suitable formula to control the correct match when reading the return symbol
4.2 All this easily rephrased for FAs? • Major difference w.r.t. all previous cases (to the best of our knowledge): • The relation x-y (roughly begin-end of a right hand side) is not anymore one-to-one • Equivalently: • There is no one-to-one correspondence <read symbol – automaton transition>, i.e., unlike previous cases FAs are not real-time machines
4.3 Our approach • The fundamental difference between FLs and all other languages studied in this type of literature is that the latter ones are “explicit structure” or “explicit parentheses” languages (regular and linear ones being very special and simple cases thereof), whereas FLs, as well as other general CF languages have an implicit syntax structure determined by the OPM: • Recognizing a string of a FL requires a real, non-trivial parsing; and this has to be coded by means of suitable MSO formulas.
After a few different tries … • The main idea: • Follow the key of FG parsing, i.e. the look-ahead, look-back induced by the .> and .< relations: • They determine the (not one-to-one) x yrelation x y .< .> .<
Obvious? • Perhaps; however, from (another) reviewer, who also claimed"a fairly trivialexercise" : • "page 8: hnd(x+1) and rst(y-1): shouldn't this be hnd(x) and rst(y) for example, if z is 2 then x should be 1 and y should be 3“ • Once the new relation is well established, a few more “technicalities” (e.g., the automaton can enter different state (types) in the same position) required several weeks and pages for the authors to come up with a (hopefully) complete proof • Of course simpler, shorter, and quicker (and more “standard”) proofs would be quite welcome • “(I have not checked the cited technical report but I have a rough idea of what should be done)”. • Instead, if you are curious and (not convinced but lazy), or you just want to compare yourproof with ourown, you can always go to http://arxiv.org/abs/1204.4639
(Very personal) conclusions • FGs, FLs, FAs are a rich mine of theoretical properties –not only those addressed in this contribution – with important practical impact in different fields such as MC and parsing • Worth further investigation, not just for fun: • -languages • Local parsability (extensions) • Pairing with semantic analysis • ….