520 likes | 624 Views
IVOX I ncremental V iew Maintenance for O rdered X ML. DSRG Talk WPI February 20 th 2003 Students: Katica Dimitrova & Maged El Sayed Advisor: Prof. Elke Rundensteiner. Outline. Motivation Problem Description Background XML Algebra Order in XML Algebra The IVOX Approach
E N D
IVOXIncremental View Maintenance for Ordered XML DSRG Talk WPI February 20th 2003 Students: Katica Dimitrova & Maged El Sayed Advisor: Prof. Elke Rundensteiner
Outline • Motivation • Problem Description • Background • XML Algebra • Order in XML Algebra • The IVOX Approach • Order Encoding • Overall strategy • System Architecture • Related Work • Future Work
Outline • Motivation • Problem Description • Background • XML Algebra • Order in XML Algebra • The IVOX Approach • Order Encoding • Overall strategy • System Architecture • Related Work • Future Work
Views in general Data warehouses Information integration Access control, Privacy, ..etc XML Views (EXTRA useful) Information Inter-Portability Crossing gaps between different data models Materialized Views Speed up data retrieval Query optimization Increased availability Motivation View View Definition Query RDB XML Other Sources
Maintaining Materialized Views When sources are updated, materialized view may becomes inconsistent. Methods of view maintenance • Recomputation • recompute view from scratch from base data • Incremental view maintenance • compute changes to view in response to changes to base sources Heuristic: Incremental view maintenance is usually cheaper than full recomputation.
Outline • Motivation • Problem Description • Background • The XAT Algebra • XML order in the XAT Context • The IVOX Approach • Order Encoding • Overall strategy • System Architecture • Related Work • Future Work
The Problem • Previous work for: • Relational [GMS93],bag semantics [GL95], [ZGHW95], [PSCP02] • Object-Relational [LVM00] • Object-Oriented [AFP02] • Structured data models [AMRVW98], [ZM98] • XML data model not handling order [LD00] • Can techniques for other data models be reused for XML?
Is Maintaining XML Views Different? • XML features • Hierarchical • Optional elements • Self-typed • References • Ordered • Expressiveness of view definition language • Complex operations • tagging, unnesting, aggregation, .. • Expected large auxiliary information
Example <result> <book> <title>Data on the Web</title> <price>39.95</price> </book> </result> <bib> <book> <price> 65.95 </price> <title> Advanced Programming in the Unix environment </title> </book> <book> <title> TCP/IP Illustrated </title> </book> <book> <price>39.95</price> <title> Data on the Web </title> </book> </bib> View Extent Bib.xml <result> for $b in document("bib.xml")/bib/book where $b/price/text() < 60 return <book> $b/title, $b/price </book> </result> List all books that cost less than $60, including their title and price View Definition Query
Example <result> <book> <title>Data on the Web</title> <price>39.95</price> </book> </result> <book> <title>TCP/IP Illustrated</title> <price>55.48</price> </book> <bib> <book> <price> 65.95 </price> <title> Advanced Programming in the Unix environment </title> </book> <book> <title> TCP/IP Illustrated </title> </book> <book> <price>39.95</price> <title> Data on the Web </title> </book> </bib> <price>55.48</price> View Extent Bib.xml <result> for $b in document("bib.xml")/bib/book where $b/price/text() < 60 return <book> $b/title, $b/price </book> </result> Insert element <price>55.48</price> into second book View Definition Query
Our Goal • Design incremental view maintenance strategy for XQuery views that: • Correctly update the view • Is order sensitive • Returns view in proper order • Allows for updates that specify order • Covers at least the “core” of XQuery language views • Minimizes auxiliary information requirements
Basics of IVOX Approach: Algebraic Update propagation rules for each algebra operator and each update type XML View D2 Update D2 Update Algebra Tree Operator Operator XQuery Definition D1 D1 Update Execution View Maintenance XML Source XML Source XML Source time Update
Why Algebraic? • Robust – Easily adaptable to operator semantic changes • Extensible – new operators can be added • Allows for reuse of techniques for known operators • Language independent- independent of syntax changes (of XQuery by W3C) • Formal – basis for provable correctness
Outline • Motivation • Problem Description • Background • XML Algebra • Order in XML Algebra • The IVOX Approach • Order Encoding • Overall strategy • System Architecture • Related Work • Future Work
Background on XML Algebra XAT • XAT Operators • SQL Operators: Select, Project … • Special Operators: Source, FOR… • XML Operators: Navigate, Tagger .. • XAT Data Model (XAT Table) • Order sensitive table of tuples • Columns denote user-specified or internally generated variable bindings • A cell in a tuple holds an XML node for a sequence of XML nodes $col1, price $col3
Order among tuples Order among XML nodes in a cell Order in XAT Context $col1, price $col3
Order among the tuples Order among XML nodes in a single cell Order in the XAT Context ( , ) Agg$col5
On update worry about: Order among tuples Order among XML nodes in a cell Order in XAT Context: View Maintenance $col1, price $col3
On update worry about: Order among the tuples Order among XML nodes in a single cell Order in XAT Context & View Maintenance ( , ) Agg$col5
Complex operations require auxiliary information Auxiliary information can be too large in XAT context May be expensive to maintain it Duplicate Information in XAT Context $col1, price $col3 ! Duplicated Storage
Outline • Motivation • Problem Description • Background • XML Algebra • Order in XML Algebra • The IVOX Approach • Order Encoding • Overall strategy • System Architecture • Related Work • Future Work
$b $col3 <book>…. </book> <price> 65.95 </price> <book>…. </book> <price> 55.48 </price> <book>…. </book> <price> 39.95 </price> $col1, price $col3 $b <book> <price> 65.95 </price> <title> Advanced …</title> </book> <book> <title> TCP/IP …</title> </book> <price>55.48</price> <book>….</book> Possible Solutions to Order Preservation (I) • Sequential storage (XPROP approach by Maged, Ling & Luping) • Assume intermediate results stored sequentially • Inserts and deletes are performed in physical order • No order encoding Special support required for secondary storage May require iteration over many tuples to determine order
Ord 1 3 2 2 <book>…. </book> <price> 55.48 </price> <price>55.48</price> Possible Solutions to Order Preservation (II) • Naïve order encoding for tuples and sequences of XML nodes • Assign order numbers to tuples and to XML nodes in a sequence Requires frequent renumbering on inserts. $col1, price $col3
Using Node Identity • Idea: Use node identity • Usage: • For encoding order and structure • As a reference to base data
Existing techniques for encoding order for XML Global Order (UW) Local Order (UW) Dewey Order (UW) Lexicographical Order (MASS) What Encoding For Node Identity? 1 bib 8 2 7 book book 5 book 10 9 6 title 3 price price 9 8 4 price title 7 6 title
Existing techniques for encoding order for XML Global Order (UW) Local Order (UW) Dewey Order (UW) Lexicographical Order (MASS) What Encoding For Node Identity? 1 bib 1 3 book book 2 book 2 1 title 1 price price 1 2 price title 2 1 title
Existing techniques for encoding order for XML Global Order (UW) Local Order (UW) Dewey Order (UW) Lexicographical Order (MASS) What Encoding For Node Identity? 1 bib 1.1 1.3 book book 1.2 book 1.3.2 1.2.1 title 1.1.1 price price 1.3.1 1.1.2 price title 1.2.2 1.2.1 title
Existing techniques for encoding order for XML Global Order (UW) Local Order (UW) Dewey Order (UW) Lexicographical Order (MASS) What Encoding For Node Identity? b bib b.b b.f book book b.d book b.f.l b.d.b title b.b.b price price b.f.cm b.b.cd price title The Winner b.d.f title
Lexicographical Keys: LexKeys • What are LexKeys? • Multi-level lexicographical keys • Example: c , ba.c.b • Examples of comparison b < b.c bab < bd.cc b.b < b.b.c • Advantages • All LexKeys form a totally ordered set with respect to < • It is always possible to generate a key between two keys • The deletion of a LexKey in a sequence does not affect other LexKeys • Usage • Reference to XML nodes • Encoding order
LexKeys in XAT Tables $b, price $col2 $b, price $col2
Order Among XAT Tuples Notion: designate order schema to XAT tables • Ordering by LexKeys by columns in order schema yields correct tuple order. Order Schema 1 2 3 1 2
Calculating Order Schema • Rules for each operator • Calculated in a postorder traversal of the tree • Sample Rules
Order Among Tuples Example 1 1 2 $b, price $col2 $b, price $col2 1 1 2 3
Order in Collection within a cell? 2 1 ( { } , , ) Agg$col5 Agg$col5 1 2 2 1
Smart Keys • What is a SmartKey? SmartKey Key part, by default also represents order Optional, only represents order when present • Notation: key(order) • Examples • b.c.b (h) • b.c.b
SmartKeys in XATTables 2 1 ( { } , , ) Agg$col5 Agg$col5 1 2 2 1
Not touching other tuples in XAT table No reordering ever needed. Gaining distributiveness in regard to bag union on tuple level Order Among XAT Tuples during View Maintenance 1 3 2 $col1, price $col3 1 3 2
Not touching other members of the sequence No reordering ever needed. Gaining distributiveness in regard to bag union on cell level Order in a Sequence during View Maintenance 2 1 { } , Agg$col5 2 1
Use distributiveness in regard to bag union Reuse rules from relational for most SQL XAT operators XAT table 2 Update to XAT table 2 Operator Operator XAT table 1 Update to XAT table 1 Execution View Maintenance time Update Propagation Rules
Update Propagation Rules Example(Navigate Unnest on Insert Tuple) T2old = $col,path$col’ (T1old) T1new=T1old + T1 T2new = $col,path$col’ (T1old + T1) = = $col,path$col’ (T1old) + $col,path$col’ (T1) = = T2old + T2 + represents bag union T2 T2 $col,path$col’ $col,path$col’ T1 T1 Execution View Maintenance time
Update xatup Update XQuery keyup xmlup Update Propagation Strategy XML View XAT Translator XML Source XML Source XML Source Storage Manager
Update Primitives (The Format of Delta) Apply to original XML Document • XML Update Primitives (xup) • Insert (xmlFragment, path) • Delete (path) • InsertAtt (name, value, path) • DeleteAtt (name, path) • Replace (oldValue, newValue, path) • XML Key Update Primitives (keyup) • Insert (el, path) • Delete (path) • Replace (el, pos) • XAT Update Primitives (xatup) • InsertTuple (tuple) • DeleteTuple (tupleId) • ChangeTuple (Keyup, columnName, tupleId) Express update on original XML data in terms of LexKeys Apply to XATTable
$col6 $col5 tr { tb..b.f.l..b.f.cm(b.f.l..b.f.cm ) } b bib $col5 b.b b.d b.f tb..b.f.l..b.f.cm book book book b.b.cd b.b.b b.f.cm b.f.l $ col2 $col4 price title price title b.f.cm b.f.l b.d.f Key XDOM $col2 $col4 title tr result b.b.b b.b.cd $b $col2 b.f.cm b.f.l Key Key XDOM XDOM tb..b.f.l.. b.f.cm b.b b.b.b tb.. b.f.l.. b.f.cm tb.. b.f.l.. b.f.cm tb.. b.f.l.. b.f.cm $b book book book b.f b.f.cm b.b b.f.l b.f.cm b.f.l b.f.l b.f.cm b.f.cm b.d b.f $col1 b T <result>$col5</result> $col6 Execution Agg $col5 T <book>$col4 $col2</book> $col5 Storage Manager Constructed XDOMs $col3 < 60 $b, title $col4 $b, price $col2 $col1, book $b bib.xml $S1, bib $col1 S ”bib.xml” $S1 bib.xml
$col6 $col5 $col5 tr { tb..b.f.l..b.f.cm(b.f.l..b.f.cm ) } { tb..b.f.l..b.f.cm(b.f.l..b.f.cm ) tb..b.d.f..b.d.b(..b.d.f..b.d.b) } ChangeTuple(insert(tb..b.d.f..b.d.b, result[tr]), $col6, tr) b $col5 $col5 tb..b.d.f..b.d.b(..b.d.f..b.d.b) bib Insert (price, bib[1].book[2]) tb..b.f.l..b.f.cm tb..b.f.l..b.f.cm b.b b.d b.f ChangeTuple(insert( tb..b.d.f..b.d.b, null), $col5, ) tb..b.d.f..b.d.b book book book $ col2 $col4 b.b.cd b.b.b b.f.cm b.f.l $ col2 insertTuple({tb..b.d.f..b.d.b}) $col4 b.f.cm b.f.l price title price title b.f.cm b.f.l $col2 $col4 $col2 $col4 b.d.d b.d.f b.d.f Key XDOM Key XDOM Key Key XDOM XDOM b.b.b b.b.cd title b.b.b b.b.cd tr tb.. b.f.l.. b.f.cm tb.. b.f.l.. b.f.cm tb.. b.f.l.. b.f.cm result book book book $b $col2 b.f.cm b.f.l b.f.cm b.f.l $b $col2 b.f.l b.f.cm b.f.l b.f.l b.f.cm b.f.cm Key XDOM tb..b.f.l.. b.f.cm b.b b.b.b b.d.d b.d.f $b b.b b.b.b tb.. b.d.f.. b.d.b tb.. b.d.f.. b.d.b tb.. b.d.f.. b.d.b b.f b.f.cm tb.. b.f.l.. b.f.cm tb.. b.f.l.. b.f.cm book book book book book b.f b.f.cm b.b b.d.f b.d.b b.d.f b.d.f b.d.b b.d.b b.d b.d.b b.d b.f.l b.f.l b.f.cm b.f.cm b.f insertTuple({b.d, b,d.b}) $col1 b ChangeTuple(insert(price[b.d.b], bib[b].book[b.d]), $col2, b.f, b.f.m) insetTuple({b.d.b, b.d.f}) changeTuple(insert(price[b.d.b], book[b.d]), $b, b.d) ChangeTuple(insert(price[b.d.b], bib[b].book[b.d]), $col1, b) insertTuple({b.d.b, b.d.f}) Insert (price[b.d.b], bib[b].book[b.d]) b.d.b price T <result>$col5</result> $col6 View Maintenance Agg $col5 T <book>$col4 $col2</book> $col5 Storage Manager Constructed XDOMs $col3 < 60 $b, title $col4 $b, price $col2 $col1, book $b bib.xml $S1, bib $col1 S ”bib.xml” $S1 bib.xml
Outline • Motivation • Problem Description • Background on XAT • XML Algebra • Order in XML Algebra • The IVOX Approach • Order Encoding • Overall strategy • System Architecture • Related Work • Future Work
System Architecture View Maintenance Execution User View Definition XQuery Legend Materialized XML View Update XQuery Process XML Query Engine Update Primitive Generator Data VM Initializer XML View Maintainer Update Propagation Rules Repository XML Algebra Tree Persistent Data Storage IVOX Executer One time occurrence Rainbow XTUP On-update occurrence XML Source XML Source Materialized Auxiliary Views XML Source Storage Manager
Outline • Motivation • Problem Description • Background on XAT • XML Algebra • Order in XML Algebra • The IVOX Approach • Order Encoding • Overall strategy • System Architecture • Related Work • Future Work
Related Work • A.Gupta, I.S.Mumick. Maintenance of Materialized Views: Problems, Techniques, and Application. In Bulletin of the Technical Committee on Data engineering 1995. • T. Grin, L.Libkin. Incremental maintenance of views with duplicates. In SIGMOD 1995. • H. Liefke and S. Davidson. View Maintenance for Hierarchical Semistructured Data. In DAWAK 2000. • S. Abiteboul, J. McHugh, Rys, Vassalos, J. Wiener. Incremental Maintenance for Materialized Views over Semistructured Data. In VLDB 1998.