360 likes | 752 Views
Tree Automata. First: A r eminder on Automata on words. Finite state automata on words. Transitions. Alphabet. State. Initial state. Accepting states. Nondeterministic automaton: Example. a b a -. a b a a b -. q 0. q 0. q 0. q 0. q 0. q 0. q 0. q 0. q 0. q 0. q 2.
E N D
Tree Automata First: A reminder on Automata on words Typing semistructured data
Finite state automata on words Transitions Alphabet State Initial state Accepting states Typing semistructured data
Nondeterministic automaton: Example a b a - a b a a b - q0 q0 q0 q0 q0 q0 q0 q0 q0 q0 q2 q1 q1 q1 q1 q1 OK KO
Deterministic No transition No alternative transitions such as Determinization It is possible to obtain an equivalent deterministic automaton State of new automaton = set of states of the original one Possible exponential blow-up Minimization Limitations – cannot do Context-free languages Essential tool – e.g., lexical analysis Reminder
Reminder (2) • L(A) = set of words accepted by automata A • Regular languages • Can be described by regular expressions, e.g. a(b+c)*d • Closed under complement • Closed under union, intersection • Product automata with states (s,s’) where s is from A and s’ is from A’
Automata on words versus trees a Top down Bottom up Left to right b b b b a a b b a a Right to left a b No difference Differences
Automata Automata on ranked trees Typing semistructured data
Binary tree automata • Parallelevaluation • For leaves: • For othernodes: q2 a Bottom up q” q1 b b b b a a q” q’ q q a b q’ q Typing semistructured data
Bottom-up tree automata • Bottom-up: if a node labeled a has its children in states q, q’ then the node moves nondeterministically to state r or r’ • Accepts is the root is in some state in F • Not deterministic if alternatives or -transitions:
v v v v v 1 1 1 0 0 v 1 1 Boolean circuit evaluation OK
Regular tree language = set of trees accepted by a bottom-up tree automaton Typing semistructured data
Regular tree languages Theorem: the following are equivalent • L is a regular tree language • L is accepted by a nondeterministic bottom-up automaton • L is accepted by a deterministic bottom-up automaton • L is accepted by a nondeterministic top-down automaton Deterministic top-down is weaker
Top-down tree automata • Top-down: if a node labeled a is in state q”, then its left child moves to state q, right to q’ • Accepts is all leaves are in states in F • Not deterministic if
Why deterministic top-down is weaker? • Consider the language • L = { <r> <a\>,<b\> <\r>, <r> <b\>,<a\><\r>) } • It can be accepted by a bottom-up TA • Exercise: write a BUTA A such that L = L(A) • Suppose that B is a deterministic top-down TA that accepts both trees in L • Exercise: Show that B also accepts <r> <a\><a\> <\r> • A contradiction Fact: No deterministic top-down tree automata accepts exactly L
Ranked trees automata: Properties • Like for words • Determinization • Minimization • Closed under • Complement • Intersection • Union
But… • XML documents are unranked: book (intro,section*,conclusion)
Automata Automata on unranked tree Typing semistructured data
Unranked tree automata Issue: represent an infinite set of transitions Solution: a regular language
Unranked tree automata (2) • Rule: • Meaning: if the states of the children of some node labeled a form a word in L(Q), this node moves to some state in {r1,…,rm}
Building on ranked trees a a b b a b b b a b b b a b b b a b Ranked tree: FirstChild-NextSibling F: encoding into a ranked tree F is a bijection F-1: decoding
Building on bottom-up ranked trees (2) • For each Unranked TA A, there is a Ranked TA accepting F(L(A)) • For each Ranked TA A, there is an unranked TA accepting F-1(L(A)) • Both are easy to construct Consequence: Unranked TA are closed under union, intersection, complement Determinaztaion also possible, a bit more tricky