260 likes | 337 Views
This schema mediates the exchange of evolving temporal XML data, crucial for tracking changes in genomic insights. Validate, query, and manage data, maintaining integrity and provenance. Explore the construction process, validation methods, and snapshot schema principles.
E N D
Schema Mediated Exchange of Temporal XML Data Curtis Dyreson –Washington State University Richard T. Snodgrass –University of Arizona Sabah Currim – University of Arizona Faiz Currim – University of Iowa ER 2006 - Tucson
Scenario • Genomic data from NCBI • Data collection is growing/changing • Want data and data provenance (who, what, when, …)
Request D NCBI D D Obtaining Web Data Overwrite D Write D
Data Evolves • Download XML formatted data (as of January 1) <gene name="TRY4"> <desc>trypsin 4</desc> <ontology ref="MGI" function="unknown"/> </gene> • Download again (as of March 6) <gene name="TRY4"> <desc>trypsin 4, beta-cell receptor</desc> <ontology ref="MGI" function="synthesizes trypsinogen"/> </gene>
Request updates to Dsince time t Copy D Update D NCBI D Dold Change summary of D XMLDiff Refreshing the Data (using SDOs) What about versions between D and Dold? Did I download “valid” data? My DB is pretty big…
Valid Did I Download the Right Data? • Validate against schema Schema Namespace Validating Parser XML Data
Fragment of the Genomic Schema … <element name="gene"> <complexType> <attribute name="name" type="text" use="required"/> <sequence> <element name="desc" type="string"/> <element ref="ontology" minOccurs="0" maxOccurs="unbounded"/> </sequence> </complexType> </element> …
Uses of an XML Schema • Validation • XML editors • Guides query formulation • Query optimization • Provides a web service binding
Request updates to Dsince time t Extend history of D t NCBI now D D Temporal D ΔD[t,now] Temporal Schema Which elements vary over time A Temporal Data Collection • Validate the “delta” with the temporal schema • cost is size of change
Outline • Motivation • tXSchema • Architecture • Summary
Goals for a Temporal Schema • Make it easy to create a schema for temporal data • Identify which data is temporal • Upwards compatibility • Minimal extensions of XML Schema • Reuse off-the-shelf parsers/tools • Support • Valid and transaction time • Data (element) versioning • Schema versioning • Logical/physical independence • Flexible timestamp representation and location
… … Persistent Elements • An item is an element that persists across snapshots. • Item identifier (like a temporally-invariant key) <txs:itemIdentifier> <txs:field path=”@name”/> </txs:itemIdentifier> January snapshot March snapshot
Extend a Snapshot Schema • Specify which elements are temporal • Temporal elements have • Item identifiers • Simple constraints (state/event, existence/content-varying) <element name="gene"> <txs:temporal> <txs:itemIdentifier> <txs:field path="@name"/> </txs:itemIdentifier> <txs:transactionTime kind="state" contentVarying="true" existenceVarying="no gaps"/> </txs:temporal> ….definition of gene from the snapshot schema omitted for space… </element>
Versions • A version is a change in an item. • DOM inequivalence January snapshot March snapshot … …
Temporal Genomic Data <dataTemporal> <data><geneTemporal itemRef="1"/></data> <geneItem itemId="1"> <geneVersion><time start="2005-01-01" end="2005-03-05"/> <gene name="TRY4"> <desc>trypsin 4</desc> <ontologyTemporalitemRef="2"/> </gene> </geneVersion> <geneVersion><time start="2005-03-06" end="now"/> …next version of gene… </geneVersion> </geneItem> <ontologyItem itemId="2"> …ontology item… </dataTemporal>
Outline • Motivation • tXSchema • Architecture • Summary
Construction Process Representational Schema Valid Valid Temporal Data Not valid Validating Temporal Data • Snapshot data validated with a snapshot schema • Construct a representational schema (details in paper) • Can also validate the “delta” Snapshot Schema Namespace Validating Parser XML Data
At time T Validating Parser Snapshot Snapshot Schema Property of a “Good” Construction • Every snapshot must conform to the snapshot schema Temporal Schema (Temporal) Validating Parser Valid Temporal data Valid
Outline • Motivation • tXSchema • Architecture • Summary
Related Work – Temporal XML • Change detection and management • Nguyen, Abiteboul, Cobena, Preda, SIGMOD 2001 • Xyleme’s Alerter, described in Data Engineering Bulletin, 2001 • Dyreson, Lin, Wang WWW 2004 • Leonardi, Bhowmick, ER 2006 • Representing time-varying XML documents (versioning) • Chawathe, Abiteboul, Widom, ICDE 1998 • Dyreson, Böhlen, Jensen, VLDB 1999 • Chien, Tsotras, Zaniolo, VLDB 2000 • Marian, Abiteboul, Cobena, Mignet, VLDB 2001 • Buneman, Khanna, Tajima, Tan, SIGMOD 2002, TODS 2004 • Rosado, Marquez, Gonzalez, ECDM 2006 • XML Versioning Use Cases (W3C)
Related Work – XML Schemas • XML Schema languages • Many, but XML Schema is backed by the W3C • Incremental XML validation • Bouchou & Halfeld-Ferrari, DBPL 2003 • Papkonstantinou & Vianu, ICDT 2003 • Barbosa, Mendelzon, Libkin, Mignet, Arenas, ICDE 2004 • Temporal XML schemas • Currim, Currim, Dyreson, Snodgrass, EDBT 2004 • Dyreson, Snodgrass, Currim, Currim, Joshi, XSDM 2006
Aspect Enhanced .java weaver Cut points javac An Overarching Vision • Aspect-oriented programming • Cross-cutting concerns • Augment behavior without changing the code • Example aspects: logging, garbage collection Program .java Aspect .java
time security reliability Aspects for Data? • What are cross-cutting concerns? • Milieu of metadata • Time is an aspect
imports schema Validation aspect + XML data conventional validating parser imports schema snapshot data snapshot gluer aspect validator Aspects in Schema Design • Schema for aspect + schema for data • Our paper describes the “plumbing” for a temporal aspect data (snapshot) schema aspect schema schema tapestry schema weaver
Our Contributions • Temporal schema specification • What is time-varying • Some simple constraints • Validate temporal data • ΔD[t-now] cost • Upwards compatible with XML Schema • Handle schema evolution (Dyreson et al., XSDM ’06) • Suite of tools • Reuse and extend existing tools • www.cs.arizona.edu/tau
tXSchema Project Tools (Beta) • tVALIDATOR – Validating temporal XML document for conventional and temporal constraints • SQUASH – Generating a temporal document from a sequence of snapshot documents • UNSQUASH – Extracting snapshot documents from a temporal document • RESQUASH – Changing a document representation to be consistent with the new physical annotation.