1 / 26

Change-Centric Management of Versions in an XML Warehouse

Change-Centric Management of Versions in an XML Warehouse. Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt. Overview. The Xyleme Project Change Management Version Management XIDs XML Diff Deltas Storage of XML documents versions

amity
Download Presentation

Change-Centric Management of Versions in an XML Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt

  2. Overview • The Xyleme Project • Change Management • Version Management • XIDs • XML Diff • Deltas • Storage of XML documents versions • Implementation and experiments Amélie Marian

  3. The Xyleme Project • A dynamic XML Data Warehouse with high level services: • User-friendly Query Engine • Semantic Data Integration • Version Management • Query Subscription, Change Monitoring services • Xyleme project is now finished • Start-up also called Xyleme Amélie Marian

  4. Change Management • Version Management • Learning about Changes • Monitoring Changes: Query Subscription • Querying the Past:Temporal Queries Amélie Marian

  5. Version Management Our Requirements: • Obtain the current version • Get the modifications since time t • Subscribe to change notifications, query changes • Compute temporal queries • Rebuild the version Vi of a document at time ti Amélie Marian

  6. Catalog Catalog Pr Pr Pr Pr Pr Pr N P N P N P N P N P N P Camera 300 TV 100 VCR 200 TV 100 DVD 500 VCR 150 Version 2 Version 1 Getting the Documents • XML documents are fetched from the web • We only have snapshots of the documents Amélie Marian

  7. XIDs • Unique identifiers needed to track XML nodes through time: • Track changes on a specific node (ex: a product in a catalog) • Reconstruct the history of a node • But physically adding an ID attribute to each node is expensive storage-wise  XIDs: allow to attach persistent IDs to every node in a storage efficient manner Amélie Marian

  8. 13 12 15 3 1 2 14 7 10 11 8 9 XID-map (1-3,14-15,7-13|16) XIDs • XIDs stored separately as a list (XID-map) • List of the nodes IDs in a postorder traversal of the tree • XIDnext: gives the next available XID • Compact Representation • Document is not modified Amélie Marian

  9. XML Diff • We implemented a XML diff algorithm to compute changes between two versions of a document: • Use of XML structure for matching • Content matching Linear in the size of the document • XML diff has two roles: • Match nodes • Build the delta • Ongoing work on improving the XML diff Amélie Marian

  10. Catalog Catalog 16 16 Insert Pr Pr Pr Pr Pr Pr 15 21 10 5 10 15 N P N P N P N P N P N P Delete 2 4 7 9 12 14 7 9 20 18 12 14 Update Camera 300 TV 100 VCR 200 TV 100 DVD 500 VCR 150 6 8 11 13 1 3 6 8 11 13 17 19 Version 2 Version 1 XID-map: (1-16|17) Node Matching using a Diff Algorithm Diff (V1,V2) delete(5) update(13,150) insert(16,2,(17-21)) New XID-map: (6-10,17-21,11-16|22) Amélie Marian

  11. Edit-Scripts = SEQUENCE • Sequences of basic operations over XML trees: • Delete(n) • Update(n, v) • Insert(m,k,T) • Move(n,k,m) • An Edit Script can be applied to a document D if its operations are consistent with D • An Edit Script applied to a document D will result in a unique document D’ • Several Edit Scripts applied to a document Dcan result in the same document D’ Amélie Marian

  12. Deltas (Δ) = SET • We introduce an alternative way of representing changes: Deltas • Δi,j (unit delta) contains the Set of operations needed to go from Vi to Vj ( Diff(Vi,Vj) ) • A Delta (Δ) over a document D is the sequence of unit deltas over D: Δ={Δ1,2,..., Δk-1,k} • There is a (almost) unique delta from Vi to Vj • We represent Deltas as XML documents Amélie Marian

  13. Storage Policies V1, Δ1,2,…Δnow-1,now Δ2,1,…Δnow,now-1, Vnow V1, Δ2,1,…Δnow,now-1 Δ1,2,…Δnow-1,now, Vnow Shortcomings of Deltas • Deltas are not reversible and cannot be composed (information on position is missing) • Only a) and b) lossless • But we would like to have fast access to: • Vnow • Δi,now Amélie Marian

  14. Completed Deltas (Δ+) • Completed deltas contain more information : • Delete(m,k,T) • Update(n, ov, nv) • Insert(m,k,T) • Move(n,k,m,p,q) • Completed Deltas can be reversed and composed • Completed Deltas are in the spirit of some logs in DB systems Amélie Marian

  15. Example of XML Δ+ <delta> <unit_delta> … </unit_delta> <unit_delta> <time from=“1” to=“2”/> <delete parent=“16” position=“1” xid-map=“(1-5)”> <Product> <Name>Camera</Name> <Price>300</Price> </Product> </delete> <update xid=“13” new_value=“150” old_value=“200”/> <insert parent=“16” position=“2” xid-map=“(17-21)”> <Product> <Name>DVD</Name> <Price>500</Price> </Product> </insert> </unit_delta> </delta>

  16. Operations on Deltas • Compute with version: • Vi o Δ+i,j = Vj • Vi o Δi,j = Vj • Reverse: (Δ+i,j)-1= Δ+j,i • Compose: Δ+i,j;Δ+j,k =Δ+i,k • Simplify: Δ+i,j → Δi,j Amélie Marian

  17. Storage of Versions • For a document D (or a query result Q), we store: • Current Version: Vk • XID-map (as text) of Vk • Current Δ+ = {Δ+1,2,..., Δ+k-1,k} • When a new version k+1 arrives: • ComputeXML diff between k and k+1, computeΔ+k,k+1 • Replace current version: Vk+1 • Replace XID-map • Append Δ+k,k+1 toΔ+ Amélie Marian

  18. Levels of Versioning • Full versioning is expensive, we support different levels of versioning: • Full Versioning: Vnow + Δ+ • Partial Versioning: Vnow + Δ • Last Version Update: Vnow + Δnow-1,now • Change Support: Vnow + XML diff computed for Query Subscription • Not Versioned: Vnow Amélie Marian

  19. Implementation • Version Manager and XML diff implemented in C++ • A change simulator was implemented for tests • A GUI was implemented Amélie Marian

  20. GUI Interface

  21. Reasonable when there are not many modifications Relatively expensive for small documents Depends on the quality of the diff Deltas Statistics Amélie Marian

  22. 30% of modifications on the document From left to right Snapshots Completed Deltas Deltas: composition and previous version reconstruction are not possible Composed Completed Deltas: advantages of Completed Deltas but coarser granularity and higher cost. Deltas Statistics (2) Amélie Marian

  23. Conclusion • Management of Versions based on Change Representation: • Representation in tree data (XML) • Study of storage policies • Implementation of running prototypes • Completed Deltas: a Set of Modifications • Mathematical properties on completed deltas (algebraic group) • Current work on Query Subscription, Continuous Queries and Changes over Collections of Documents Amélie Marian

  24. References • Version Management • Chien, Tsotras and Zaniolo. Efficient Management of Multiversion Documents by Object Referencing. VLDB 2001. • Chawathe, Abiteboul and Widom. Managing Historical Semistructured Data. TAPOS 1999. • Cellary and Jomier. Consistency of Versions in Object-Oriented Databases. VLDB 1990. • Adiba and Lindsay. Database Snapshots. VLDB 1980. • Diff Algorithms • Chawathe and Garcia-Molina. Meaningful Change Detection in Structured Data. Sigmod 1997. • Cobena, Abiteboul and Marian. Detecting Changes in XML Documents. Technical report INRIA. • Xyleme • Cluet, Veltri and Vodislav. Views in a Large Scale XML Repository. VLDB 2001. • Nguyen, Abiteboul, Cobena and Preda. Monitoring XML data on the Web. Sigmod 2001. Amélie Marian

  25. P C B A Version 1 Example: Edit-Scripts vs. Deltas • A Possible Edit-Script: Insert(B,1,P) Insert(C,1,P) • The Delta: Insert(B,2,P) Insert(C,1,P) P A Version 0 Amélie Marian

  26. P C B A P P Version 1 B D A C A Version 0 Version 2 Example: Missing Information for Delta Composition (Δ(0,2)) Deltas do not give information on parents and positions of deleted elements • Positions of inserted elements in composition cannot be computed Amélie Marian

More Related