260 likes | 333 Views
Schema Mediated Exchange of Temporal XML Data. Curtis Dyreson – Washington State University Richard T. Snodgrass – University of Arizona Sabah Currim – University of Arizona Faiz Currim – University of Iowa. Scenario. Genomic data from NCBI
E N D
Schema Mediated Exchange of Temporal XML Data Curtis Dyreson –Washington State University Richard T. Snodgrass –University of Arizona Sabah Currim – University of Arizona Faiz Currim – University of Iowa ER 2006 - Tucson
Scenario • Genomic data from NCBI • Data collection is growing/changing • Want data and data provenance (who, what, when, …)
Request D NCBI D D Obtaining Web Data Overwrite D Write D
Data Evolves • Download XML formatted data (as of January 1) <gene name="TRY4"> <desc>trypsin 4</desc> <ontology ref="MGI" function="unknown"/> </gene> • Download again (as of March 6) <gene name="TRY4"> <desc>trypsin 4, beta-cell receptor</desc> <ontology ref="MGI" function="synthesizes trypsinogen"/> </gene>
Request updates to Dsince time t Copy D Update D NCBI D Dold Change summary of D XMLDiff Refreshing the Data (using SDOs) What about versions between D and Dold? Did I download “valid” data? My DB is pretty big…
Valid Did I Download the Right Data? • Validate against schema Schema Namespace Validating Parser XML Data
Fragment of the Genomic Schema … <element name="gene"> <complexType> <attribute name="name" type="text" use="required"/> <sequence> <element name="desc" type="string"/> <element ref="ontology" minOccurs="0" maxOccurs="unbounded"/> </sequence> </complexType> </element> …
Uses of an XML Schema • Validation • XML editors • Guides query formulation • Query optimization • Provides a web service binding
Request updates to Dsince time t Extend history of D t NCBI now D D Temporal D ΔD[t,now] Temporal Schema Which elements vary over time A Temporal Data Collection • Validate the “delta” with the temporal schema • cost is size of change
Outline • Motivation • tXSchema • Architecture • Summary
Goals for a Temporal Schema • Make it easy to create a schema for temporal data • Identify which data is temporal • Upwards compatibility • Minimal extensions of XML Schema • Reuse off-the-shelf parsers/tools • Support • Valid and transaction time • Data (element) versioning • Schema versioning • Logical/physical independence • Flexible timestamp representation and location
… … Persistent Elements • An item is an element that persists across snapshots. • Item identifier (like a temporally-invariant key) <txs:itemIdentifier> <txs:field path=”@name”/> </txs:itemIdentifier> January snapshot March snapshot
Extend a Snapshot Schema • Specify which elements are temporal • Temporal elements have • Item identifiers • Simple constraints (state/event, existence/content-varying) <element name="gene"> <txs:temporal> <txs:itemIdentifier> <txs:field path="@name"/> </txs:itemIdentifier> <txs:transactionTime kind="state" contentVarying="true" existenceVarying="no gaps"/> </txs:temporal> ….definition of gene from the snapshot schema omitted for space… </element>
Versions • A version is a change in an item. • DOM inequivalence January snapshot March snapshot … …
Temporal Genomic Data <dataTemporal> <data><geneTemporal itemRef="1"/></data> <geneItem itemId="1"> <geneVersion><time start="2005-01-01" end="2005-03-05"/> <gene name="TRY4"> <desc>trypsin 4</desc> <ontologyTemporalitemRef="2"/> </gene> </geneVersion> <geneVersion><time start="2005-03-06" end="now"/> …next version of gene… </geneVersion> </geneItem> <ontologyItem itemId="2"> …ontology item… </dataTemporal>
Outline • Motivation • tXSchema • Architecture • Summary
Construction Process Representational Schema Valid Valid Temporal Data Not valid Validating Temporal Data • Snapshot data validated with a snapshot schema • Construct a representational schema (details in paper) • Can also validate the “delta” Snapshot Schema Namespace Validating Parser XML Data
At time T Validating Parser Snapshot Snapshot Schema Property of a “Good” Construction • Every snapshot must conform to the snapshot schema Temporal Schema (Temporal) Validating Parser Valid Temporal data Valid
Outline • Motivation • tXSchema • Architecture • Summary
Related Work – Temporal XML • Change detection and management • Nguyen, Abiteboul, Cobena, Preda, SIGMOD 2001 • Xyleme’s Alerter, described in Data Engineering Bulletin, 2001 • Dyreson, Lin, Wang WWW 2004 • Leonardi, Bhowmick, ER 2006 • Representing time-varying XML documents (versioning) • Chawathe, Abiteboul, Widom, ICDE 1998 • Dyreson, Böhlen, Jensen, VLDB 1999 • Chien, Tsotras, Zaniolo, VLDB 2000 • Marian, Abiteboul, Cobena, Mignet, VLDB 2001 • Buneman, Khanna, Tajima, Tan, SIGMOD 2002, TODS 2004 • Rosado, Marquez, Gonzalez, ECDM 2006 • XML Versioning Use Cases (W3C)
Related Work – XML Schemas • XML Schema languages • Many, but XML Schema is backed by the W3C • Incremental XML validation • Bouchou & Halfeld-Ferrari, DBPL 2003 • Papkonstantinou & Vianu, ICDT 2003 • Barbosa, Mendelzon, Libkin, Mignet, Arenas, ICDE 2004 • Temporal XML schemas • Currim, Currim, Dyreson, Snodgrass, EDBT 2004 • Dyreson, Snodgrass, Currim, Currim, Joshi, XSDM 2006
Aspect Enhanced .java weaver Cut points javac An Overarching Vision • Aspect-oriented programming • Cross-cutting concerns • Augment behavior without changing the code • Example aspects: logging, garbage collection Program .java Aspect .java
time security reliability Aspects for Data? • What are cross-cutting concerns? • Milieu of metadata • Time is an aspect
imports schema Validation aspect + XML data conventional validating parser imports schema snapshot data snapshot gluer aspect validator Aspects in Schema Design • Schema for aspect + schema for data • Our paper describes the “plumbing” for a temporal aspect data (snapshot) schema aspect schema schema tapestry schema weaver
Our Contributions • Temporal schema specification • What is time-varying • Some simple constraints • Validate temporal data • ΔD[t-now] cost • Upwards compatible with XML Schema • Handle schema evolution (Dyreson et al., XSDM ’06) • Suite of tools • Reuse and extend existing tools • www.cs.arizona.edu/tau
tXSchema Project Tools (Beta) • tVALIDATOR – Validating temporal XML document for conventional and temporal constraints • SQUASH – Generating a temporal document from a sequence of snapshot documents • UNSQUASH – Extracting snapshot documents from a temporal document • RESQUASH – Changing a document representation to be consistent with the new physical annotation.