210 likes | 334 Views
University of Crete Department of Computer Science ΗΥ-5 61 Web Data Management. XML Data Archiving Konstantinos Kouratoras. What is the problem?. Most research on database content Usually overwrite existing state Need of research on database history Lost scientific evidence
E N D
University of Crete Department of Computer Science ΗΥ-561Web Data Management XML Data Archiving KonstantinosKouratoras
What is the problem? • Most research on database content • Usually overwrite existing state • Need of research on database history • Lost scientific evidence • No verification of findings basis XML Data Archiving – Konstantinos Kouratoras
Why is this interesting? • History of the data • Scientific research • SWISS-PROT (protein sequence) • OMIM (human genes and genetic disorders) • Great deal of manual labour • Continuous changes • Access to old versions XML Data Archiving – Konstantinos Kouratoras
First Approach • Object matching across versions • Changes descriptions • Archive space • History efficient queries XML Data Archiving – Konstantinos Kouratoras
Proposed technique (1/2) Based on: • Hierarchical data • Key structured databases • Accretive databases XML Data Archiving – Konstantinos Kouratoras
Proposed technique (2/2) • Merging versions into one hierarchy • Elements stored once • Timestamps • Sequence of versions • Time intervals • Inheritance • Keys for element identification XML Data Archiving – Konstantinos Kouratoras
Example XML Data Archiving – Konstantinos Kouratoras
XML Model (1/3) • Nodes values • T-node: data values • A-node: attribute name, attribute value • E-node (internal nodes): tag name • List of values of E and T children • Set of values of A children • Nodes value equality • Agree on their value • Path expression • Sequence of node names XML Data Archiving – Konstantinos Kouratoras
XML Model (2/3) • Key • Pair of path expressions (Q, {P1,…,Pk}) • Q: target set of nodes • {P1,…,Pk}: Q key constraints • Relative key • Description dependent on ancestor node key • Weak entities XML Data Archiving – Konstantinos Kouratoras
XML Model (3/3) • Keys for previous example • (/,(db,{})) • At most one db element at the root • (/db,(address,{})) • At most one address under db node • (/db,(emp,{id})) • Every employee within a db element can be uniquely identified by his id subelement • (/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{})) • There can be at most one name, sal and tel node for each employee XML Data Archiving – Konstantinos Kouratoras
Components (1/4) • Archiver components overview Archive Annotate Keys, Timestamps Nested Merge New Archive Keys Annotate Keys New version Archiver XML Data Archiving – Konstantinos Kouratoras
Components (2/4) • Annotate keys • Elements annotation with key values • Uniquely identified nodes • Path from root to node • Key annotation XML Data Archiving – Konstantinos Kouratoras
Components (3/4) • Nested merge • Identify corresponding elements • Merge elements • Update sets of timestamps • Nodes with no corresponding • Simply added XML Data Archiving – Konstantinos Kouratoras
Components (4/4) XML Data Archiving – Konstantinos Kouratoras
Experimental Results (1/2) • Competitive techniques • Incremental diff • Cumulative diff • Compression methods • Gzip (text) • Xmill (XML) XML Data Archiving – Konstantinos Kouratoras
Experimental Results (2/2) XML Data Archiving – Konstantinos Kouratoras
Efficient Retrievals (1/2) • Version retrieval • Binary tree for each node x with children as leaves • Timestamp • Archive offset XML Data Archiving – Konstantinos Kouratoras
Efficient Retrievals (2/2) • Temporal history retrieval • Find keyed node x • Set of keyed children • Archive offset, timestamp offset • Sort list • Repeat for each keyed node XML Data Archiving – Konstantinos Kouratoras
Conclusion • Efficient archiving technique • Meaningful change descriptions • Space overhead comparable to diff approach • OMIM archive for a year • Less than 1.12 times the space of last version • Less than 1.08 times the size of incremental-diff • 40% compression with XML compression tool • Works well with XML compression • Basic operations with single pass • XML output (further use) XML Data Archiving – Konstantinos Kouratoras
Xarch (1/2) • Archiving tool • Extends archiving technique • Sort elements by key • External merge sort • Query language • Versions retrieval • History tracking XML Data Archiving – Konstantinos Kouratoras
Xarch (2/2) • Query language example XML Data Archiving – Konstantinos Kouratoras