270 likes | 406 Views
Change-Centric Management of Versions in an XML Warehouse. Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt. Overview. The Xyleme Project Change Management Version Management XIDs XML Diff Deltas Storage of XML documents versions
E N D
Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt
Overview • The Xyleme Project • Change Management • Version Management • XIDs • XML Diff • Deltas • Storage of XML documents versions • Implementation and experiments Amélie Marian
The Xyleme Project • A dynamic XML Data Warehouse with high level services: • User-friendly Query Engine • Semantic Data Integration • Version Management • Query Subscription, Change Monitoring services • Xyleme project is now finished • Start-up also called Xyleme Amélie Marian
Change Management • Version Management • Learning about Changes • Monitoring Changes: Query Subscription • Querying the Past:Temporal Queries Amélie Marian
Version Management Our Requirements: • Obtain the current version • Get the modifications since time t • Subscribe to change notifications, query changes • Compute temporal queries • Rebuild the version Vi of a document at time ti Amélie Marian
Catalog Catalog Pr Pr Pr Pr Pr Pr N P N P N P N P N P N P Camera 300 TV 100 VCR 200 TV 100 DVD 500 VCR 150 Version 2 Version 1 Getting the Documents • XML documents are fetched from the web • We only have snapshots of the documents Amélie Marian
XIDs • Unique identifiers needed to track XML nodes through time: • Track changes on a specific node (ex: a product in a catalog) • Reconstruct the history of a node • But physically adding an ID attribute to each node is expensive storage-wise XIDs: allow to attach persistent IDs to every node in a storage efficient manner Amélie Marian
13 12 15 3 1 2 14 7 10 11 8 9 XID-map (1-3,14-15,7-13|16) XIDs • XIDs stored separately as a list (XID-map) • List of the nodes IDs in a postorder traversal of the tree • XIDnext: gives the next available XID • Compact Representation • Document is not modified Amélie Marian
XML Diff • We implemented a XML diff algorithm to compute changes between two versions of a document: • Use of XML structure for matching • Content matching Linear in the size of the document • XML diff has two roles: • Match nodes • Build the delta • Ongoing work on improving the XML diff Amélie Marian
Catalog Catalog 16 16 Insert Pr Pr Pr Pr Pr Pr 15 21 10 5 10 15 N P N P N P N P N P N P Delete 2 4 7 9 12 14 7 9 20 18 12 14 Update Camera 300 TV 100 VCR 200 TV 100 DVD 500 VCR 150 6 8 11 13 1 3 6 8 11 13 17 19 Version 2 Version 1 XID-map: (1-16|17) Node Matching using a Diff Algorithm Diff (V1,V2) delete(5) update(13,150) insert(16,2,(17-21)) New XID-map: (6-10,17-21,11-16|22) Amélie Marian
Edit-Scripts = SEQUENCE • Sequences of basic operations over XML trees: • Delete(n) • Update(n, v) • Insert(m,k,T) • Move(n,k,m) • An Edit Script can be applied to a document D if its operations are consistent with D • An Edit Script applied to a document D will result in a unique document D’ • Several Edit Scripts applied to a document Dcan result in the same document D’ Amélie Marian
Deltas (Δ) = SET • We introduce an alternative way of representing changes: Deltas • Δi,j (unit delta) contains the Set of operations needed to go from Vi to Vj ( Diff(Vi,Vj) ) • A Delta (Δ) over a document D is the sequence of unit deltas over D: Δ={Δ1,2,..., Δk-1,k} • There is a (almost) unique delta from Vi to Vj • We represent Deltas as XML documents Amélie Marian
Storage Policies V1, Δ1,2,…Δnow-1,now Δ2,1,…Δnow,now-1, Vnow V1, Δ2,1,…Δnow,now-1 Δ1,2,…Δnow-1,now, Vnow Shortcomings of Deltas • Deltas are not reversible and cannot be composed (information on position is missing) • Only a) and b) lossless • But we would like to have fast access to: • Vnow • Δi,now Amélie Marian
Completed Deltas (Δ+) • Completed deltas contain more information : • Delete(m,k,T) • Update(n, ov, nv) • Insert(m,k,T) • Move(n,k,m,p,q) • Completed Deltas can be reversed and composed • Completed Deltas are in the spirit of some logs in DB systems Amélie Marian
Example of XML Δ+ <delta> <unit_delta> … </unit_delta> <unit_delta> <time from=“1” to=“2”/> <delete parent=“16” position=“1” xid-map=“(1-5)”> <Product> <Name>Camera</Name> <Price>300</Price> </Product> </delete> <update xid=“13” new_value=“150” old_value=“200”/> <insert parent=“16” position=“2” xid-map=“(17-21)”> <Product> <Name>DVD</Name> <Price>500</Price> </Product> </insert> </unit_delta> </delta>
Operations on Deltas • Compute with version: • Vi o Δ+i,j = Vj • Vi o Δi,j = Vj • Reverse: (Δ+i,j)-1= Δ+j,i • Compose: Δ+i,j;Δ+j,k =Δ+i,k • Simplify: Δ+i,j → Δi,j Amélie Marian
Storage of Versions • For a document D (or a query result Q), we store: • Current Version: Vk • XID-map (as text) of Vk • Current Δ+ = {Δ+1,2,..., Δ+k-1,k} • When a new version k+1 arrives: • ComputeXML diff between k and k+1, computeΔ+k,k+1 • Replace current version: Vk+1 • Replace XID-map • Append Δ+k,k+1 toΔ+ Amélie Marian
Levels of Versioning • Full versioning is expensive, we support different levels of versioning: • Full Versioning: Vnow + Δ+ • Partial Versioning: Vnow + Δ • Last Version Update: Vnow + Δnow-1,now • Change Support: Vnow + XML diff computed for Query Subscription • Not Versioned: Vnow Amélie Marian
Implementation • Version Manager and XML diff implemented in C++ • A change simulator was implemented for tests • A GUI was implemented Amélie Marian
Reasonable when there are not many modifications Relatively expensive for small documents Depends on the quality of the diff Deltas Statistics Amélie Marian
30% of modifications on the document From left to right Snapshots Completed Deltas Deltas: composition and previous version reconstruction are not possible Composed Completed Deltas: advantages of Completed Deltas but coarser granularity and higher cost. Deltas Statistics (2) Amélie Marian
Conclusion • Management of Versions based on Change Representation: • Representation in tree data (XML) • Study of storage policies • Implementation of running prototypes • Completed Deltas: a Set of Modifications • Mathematical properties on completed deltas (algebraic group) • Current work on Query Subscription, Continuous Queries and Changes over Collections of Documents Amélie Marian
References • Version Management • Chien, Tsotras and Zaniolo. Efficient Management of Multiversion Documents by Object Referencing. VLDB 2001. • Chawathe, Abiteboul and Widom. Managing Historical Semistructured Data. TAPOS 1999. • Cellary and Jomier. Consistency of Versions in Object-Oriented Databases. VLDB 1990. • Adiba and Lindsay. Database Snapshots. VLDB 1980. • Diff Algorithms • Chawathe and Garcia-Molina. Meaningful Change Detection in Structured Data. Sigmod 1997. • Cobena, Abiteboul and Marian. Detecting Changes in XML Documents. Technical report INRIA. • Xyleme • Cluet, Veltri and Vodislav. Views in a Large Scale XML Repository. VLDB 2001. • Nguyen, Abiteboul, Cobena and Preda. Monitoring XML data on the Web. Sigmod 2001. Amélie Marian
P C B A Version 1 Example: Edit-Scripts vs. Deltas • A Possible Edit-Script: Insert(B,1,P) Insert(C,1,P) • The Delta: Insert(B,2,P) Insert(C,1,P) P A Version 0 Amélie Marian
P C B A P P Version 1 B D A C A Version 0 Version 2 Example: Missing Information for Delta Composition (Δ(0,2)) Deltas do not give information on parents and positions of deleted elements • Positions of inserted elements in composition cannot be computed Amélie Marian