320 likes | 335 Views
Incremental Validation of XML Databases. Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD. O( log n). n nodes. O( log 2 n). Incremental Validation of XML Databases:. Updates. Document Type Definition (DTD). XML Database. XML Schema/ XQuery Type System.
E N D
Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD
O(log n) n nodes O(log2n) Incremental Validation of XML Databases: Updates Document Type Definition (DTD) XML Database XML Schema/ XQuery Type System
XML As Labeled Ordered Trees cars used new car car car car year model year model model year model 92 Civic 96 Acura Civic 03 Maxima
92 Civic 96 Acura Civic 03 Maxima Document Type Definitions (DTDs): Abstraction & Example root: cars cars used new used car* new car* car (year|) model cars used new car car car car year model year model model year model dummy
Tree Satisfying DTD, General Case … … a b c … … 1 2 i-1 i i+1 k-1 k r • root : • … • r … 1 2 … k-1 k
LABELTYPES car {carU, carN} cars {carsT} used {usedT} … carsT usedT newT carU, carN carU carN yearT modelT yearT modelT modelT yearT modelT XML Schemas/XQuery Types as Specialized DTDs root: carsT carsT usedT newT usedT carU * newT carN * carU yearT modelT carN (yearT |) modelT cars used new car car car car year model year model model year model
carsT usedT newT carU, carN carU, carN carU, carN carN yearT modelT yearT modelT modelT yearT modelT Tree Automata Specialized DTDs cars used new car car car car year model year model model year model
Incremental Validation Problem Statement • For each valid tree T • use an auxiliary structure A(T) • so that, • given a series of update commands • efficiently decide if the updated • tree T’ is valid • efficiently update A(T) and T
Types of Updates: Node Renaming u(v, ) r … … a b c v … … 1 2 i-1 i i+1 k-1 k 1 2 … k-1 k
v i 1 2 … k-1 k Types of Updates: Deletion d(v) r … … a b c … … 1 2 i-1 i+1 k-1 k
Types of Updates: Insertion i insert_after(vi-1, i) r … … a b c vi-1 vi+1 … … 1 2 i-1 i+1 k-1 k
… 2 i-1 1 … 2 i-1 q0 q0 1 i+1 n-1 … n qF Validating a Renaming u(i, )on a Regular String of N : Take One Pre(i-1) Validation of one update in O(1) given precomputed Pre and Post … 2 i-1 i i+1 n-1 n N 1 … u(i, ) requires recomputation of Pre(i), Pre(i+1), … and of Post(i), Post(i-1), … Post(i+1)
q … q’ i i+1 j Ti,j = Ti,m Tm+1,j Transition Relation Definition Ti,j = { (q, q’) | } … 2 i j n-1 n 1 i+1 … m m+1 … …
T1,8 T1,4 T5,8 T1,2 T3,4 T5,6 T7,8 T1,1 T2,2 T3,3 T4,4 T5,5 T6,6 T7,7 T8,8 Transition Relation Trees 1 2 3 4 5 6 7 8
If (q0, qF) then valid Maintenance of the Structure and Validation in O(log n) T1,8 T1,8 T1,4 T5,8 T5,8 u(6, ) T1,2 T3,4 T5,6 T5,6 T7,8 T1,1 T2,2 T3,3 T4,4 T5,5 T6,6 T6,6 T7,7 T8,8 1 2 3 4 5 6 7 8
If (q0, qF) Ta Tb Tc then valid Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions TaTbTc Ta = T1 T2 T1 T2 T3 T5 T6 T7 T9 1 2 3 5 6 7 9
Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions Ta TbTc T1 T2 T3 T5 T6 T7T8 T9 1 2 3 5 6 7 9 8
T3 T5 T6 Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions Ta Tb Tc T7 T8 T9 T1 T2 7 9 8 1 4 2 3 5 6
T3T4 T5 T6 Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions Ta Tb Tc T7 T8 T9 T1 T2 7 9 8 1 4 2 3 5 6
TaTd Te Tc T3T4 T5 T6 Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions Tf Tg T7 T8 T9 T1 T2 7 9 8 1 4 2 3 5 6
r … r r … Auxiliary Structures for Incremental DTD Validation u(vi, ) r … … vi … … 1 2 i-1 i i+1 k-1 k 1 2 … k-1 k i
Specialized DTD Incremental Validation: Take One u(vi, ) types() types() r … types() types() types() types() … … types(vi)= {i,1,…, i,n} types(vi)= {i,1,…, i,n} vi … a1 ai-1 ai ai+1 ak b1 bk-1 bk …
Inefficient for Deep Trees: Apply Divide-And-Conquer in Vertical Direction Turn Specialized DTD into NFA that validates a vertical line … “Fuse” vertical and horizontal directions using binary tree and split work in both …
Tree Satisfying Specialized DTD transformed into Binary Tree Accepted By Tree Automaton a # a k b d j # k b d j # # e e c c # # # f h f h # i g # # g i # #
Size( ) > 2 Size( ) Size( ) > 2 Size( ) Size( ) > 4 Size( ) Designate Lines in Binary Trees
a # # k k d j b b d j # # # # # # e e c c # # # # # # f f h h # # i i g g # # # # # # # # Example Line Structure a
From Tree Automaton to Validating Lines with NFA a b d d e f h i j g c k
From Tree Automaton to Validating Lines with NFA a b, Tc d, Tj e f, Tg h i j g c k
m Incremental Validation of the Line Structure in O(log2|T|) a b, Tc d, Tj e f, Tg h i j g c k #updated lines < 1 + log |T| Cost of line update O(log |T|) Insert m after k
Validating Insertions and Deletions: the Non-Line-Preserving Case Insertion
Key Complexity Results • Given m updates on tree of size n, incrementally validate DTD in O(m log n) • given alphabet , size of maximum regular expression d: O(m || d2 log d log n) • Data structure of size O(d2 n) • Specialized DTDs in O(m log2 n) • given set of types ’ O(m |’|2d2 (log d + log |’|) log2n) • Data structure of size O(|’|2d2 log2n) • Lower complexity for 1-unambiguous
Ongoing and Future Work (with Andrey Balmin) • Incorporate Transition Relation Trees in B-Tree Structure • Exploit “locality” • Experimental evaluation on set of 65 DTDs: In 96% of type definitions an update may only affect transition relations of length<4 • Common case much more efficient than worse case • Detect the property and employ algorithms that do not build trt’s in such cases • Optimization over multiple updates • More complex updates & edit operations