1 / 32

Incremental Validation of XML Databases

Incremental Validation of XML Databases. Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD. O( log n). n nodes. O( log 2 n). Incremental Validation of XML Databases:. Updates. Document Type Definition (DTD). XML Database. XML Schema/ XQuery Type System.

carreno
Download Presentation

Incremental Validation of XML Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD

  2. O(log n) n nodes O(log2n) Incremental Validation of XML Databases: Updates Document Type Definition (DTD) XML Database XML Schema/ XQuery Type System

  3. XML As Labeled Ordered Trees cars used new car car car car year model year model model year model 92 Civic 96 Acura Civic 03 Maxima

  4. 92 Civic 96 Acura Civic 03 Maxima Document Type Definitions (DTDs): Abstraction & Example root: cars cars used new used  car* new  car* car  (year|) model cars used new car car car car year model year model model year model dummy

  5. Tree Satisfying DTD, General Case  …  … a b c … … 1 2 i-1 i i+1 k-1 k r • root :  • … •  r … 1 2 … k-1 k

  6. LABELTYPES car {carU, carN} cars {carsT} used {usedT} … carsT usedT newT carU, carN carU carN yearT modelT yearT modelT modelT yearT modelT XML Schemas/XQuery Types as Specialized DTDs root: carsT carsT usedT newT usedT  carU * newT  carN * carU  yearT modelT carN  (yearT |) modelT cars used new car car car car year model year model model year model

  7. carsT usedT newT carU, carN carU, carN carU, carN carN yearT modelT yearT modelT modelT yearT modelT Tree Automata  Specialized DTDs cars used new car car car car year model year model model year model

  8. Incremental Validation Problem Statement • For each valid tree T • use an auxiliary structure A(T) • so that, • given a series of update commands • efficiently decide if the updated • tree T’ is valid • efficiently update A(T) and T

  9. Types of Updates: Node Renaming u(v, ) r …  … a b c v … … 1 2 i-1 i i+1 k-1 k  1 2 … k-1 k

  10. v i 1 2 … k-1 k Types of Updates: Deletion d(v) r …  … a b c … … 1 2 i-1 i+1 k-1 k

  11. Types of Updates: Insertion i insert_after(vi-1, i) r …  … a b c vi-1 vi+1 … … 1 2 i-1 i+1 k-1 k

  12. 2 i-1 1 … 2 i-1 q0 q0 1  i+1 n-1 … n qF Validating a Renaming u(i, )on a Regular String of N : Take One Pre(i-1) Validation of one update in O(1) given precomputed Pre and Post … 2 i-1 i i+1 n-1 n N 1 … u(i, ) requires recomputation of Pre(i), Pre(i+1), … and of Post(i), Post(i-1), … Post(i+1)

  13. q … q’ i i+1 j Ti,j = Ti,m Tm+1,j Transition Relation Definition Ti,j = { (q, q’) | } … 2 i j n-1 n 1 i+1 … m m+1 … …

  14. T1,8 T1,4 T5,8 T1,2 T3,4 T5,6 T7,8 T1,1 T2,2 T3,3 T4,4 T5,5 T6,6 T7,7 T8,8 Transition Relation Trees 1 2 3 4 5 6 7 8

  15. If (q0, qF)  then valid  Maintenance of the Structure and Validation in O(log n) T1,8 T1,8 T1,4 T5,8 T5,8 u(6, ) T1,2 T3,4 T5,6 T5,6 T7,8 T1,1 T2,2 T3,3 T4,4 T5,5 T6,6 T6,6 T7,7 T8,8 1 2 3 4 5 6 7 8

  16. If (q0, qF)  Ta Tb  Tc then valid Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions TaTbTc Ta = T1 T2 T1 T2 T3 T5 T6 T7 T9 1 2 3 5 6 7 9

  17. Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions Ta TbTc T1 T2 T3 T5 T6 T7T8 T9 1 2 3 5 6 7 9 8

  18. T3 T5 T6 Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions Ta Tb Tc T7 T8 T9 T1 T2 7 9 8 1 4 2 3 5 6

  19. T3T4 T5 T6 Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions Ta Tb Tc T7 T8 T9 T1 T2 7 9 8 1 4 2 3 5 6

  20. TaTd Te Tc T3T4 T5 T6 Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions Tf Tg T7 T8 T9 T1 T2 7 9 8 1 4 2 3 5 6

  21. r … r r … Auxiliary Structures for Incremental DTD Validation u(vi, ) r … …  vi … … 1 2 i-1 i i+1 k-1 k  1 2 … k-1 k i

  22. Specialized DTD Incremental Validation: Take One u(vi, ) types() types() r … types() types() types() types() … … types(vi)= {i,1,…, i,n} types(vi)= {i,1,…, i,n} vi … a1 ai-1 ai ai+1 ak  b1 bk-1 bk …

  23. Inefficient for Deep Trees: Apply Divide-And-Conquer in Vertical Direction Turn Specialized DTD into NFA that validates a vertical line … “Fuse” vertical and horizontal directions using binary tree and split work in both …

  24. Tree Satisfying Specialized DTD transformed into Binary Tree Accepted By Tree Automaton a # a k b d j # k b d j # # e e c c # # # f h f h # i g # # g i # #

  25. Size( ) > 2 Size( ) Size( ) > 2 Size( ) Size( ) > 4 Size( ) Designate Lines in Binary Trees

  26. a # # k k d j b b d j # # # # # # e e c c # # # # # # f f h h # # i i g g # # # # # # # # Example Line Structure a

  27. From Tree Automaton to Validating Lines with NFA a b d d e f h i j g c k

  28. From Tree Automaton to Validating Lines with NFA a b, Tc d, Tj e f, Tg h i j g c k

  29. m Incremental Validation of the Line Structure in O(log2|T|) a b, Tc d, Tj e f, Tg h i j g c k #updated lines < 1 + log |T| Cost of line update O(log |T|) Insert m after k

  30. Validating Insertions and Deletions: the Non-Line-Preserving Case Insertion

  31. Key Complexity Results • Given m updates on tree of size n, incrementally validate DTD in O(m log n) • given alphabet , size of maximum regular expression d: O(m || d2 log d log n) • Data structure of size O(d2 n) • Specialized DTDs in O(m log2 n) • given set of types ’ O(m |’|2d2 (log d + log |’|) log2n) • Data structure of size O(|’|2d2 log2n) • Lower complexity for 1-unambiguous

  32. Ongoing and Future Work (with Andrey Balmin) • Incorporate Transition Relation Trees in B-Tree Structure • Exploit “locality” • Experimental evaluation on set of 65 DTDs: In 96% of type definitions an update may only affect transition relations of length<4 • Common case much more efficient than worse case • Detect the property and employ algorithms that do not build trt’s in such cases • Optimization over multiple updates • More complex updates & edit operations

More Related