1 / 14

Propagating Updates Through XML Views Using Lineage Tracing

This paper discusses a methodology for translating XQuery updates over virtual XML views to SQL updates over the base tables using lineage tracing. It proposes a static analysis to prevent and detect view side effects, and explains how to detect exclusive data sources. The paper also demonstrates the process of updating non-EDS columns and checking for side effects at runtime.

aubreyd
Download Presentation

Propagating Updates Through XML Views Using Lineage Tracing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leonidas Fegaras University of Texas at Arlington Propagating Updates Through XML Views Using Lineage Tracing

  2. Updating XML Views • Problem: Given a virtual XML view over a relational database expressed in XQuery, translate an XQuery update over the XML view to SQL updates over the base tables (embedded into pure XQuery) • Assumptions: • We use the XQuery Update Facility (XUF) • Snapshot semantics: one XQuery with updates = a single transaction • Goals: • prevent most view side-effects statically • ... but detect the rest efficiently at run-time • collectively consider all compatible updates in a transaction • This is far more expressive than considering each update in isolation

  3. Highlights • Prevent many forms of view side-effects by using syntactic restrictions and a static analysis • Allow the full extent of XQuery to define views but restrict the table connections in a view to trees • Use a conservative static analysis to detect exclusive data sources • These are the table columns that can not cause view side-effects when updated • Based on polymorphic type inference and type usage analysis • Detect the rest of the view side-effects using a general (but efficient) run-time analysis • Propagate information about the origins of the updatable data through views and XQuery updates • to be used at the update destinations • Methodology: lineage tracing • The lineage is not just a tuple-id/column-name • an XML element has a context: a chain of tuples used in reaching the element • The table updates are generated by considering all chains in a transaction

  4. Exclusive Data Source (EDS) • An exclusive data source (EDS) is a table column whose values • do not appear in the view output at all ie, they are not updatable • or they appear in the view output only once, but they are not accessed elsewhere in the view code • Exclusive data sources cannot cause view side effects when updated • Our goal is to prevent/detect side effects on non-EDS columns as well

  5. How to Detect Exclusive Data Sources? • Use polymorphic type inference with a simple usage analysis: • each table column = a fresh type variable • if the principal type of a view is polymorphic over a type variable => the view is oblivious to the table column values • in addition, if the type variable appears once in the view principal type => the table column is an exclusive data source • A proof based on the parametricity theorem ('theorem for free') • Problems: • Not all XQuery engines support static validation • May be too conservative • Limitation: the analysis granularity is a table column • Not good for generic mappings, eg, a view that uses a single tag table • The tag table may be used multiple times in the view (once per tag) • Cannot statically deduce disjointness • But may use extra annotations in the view to specify disjointness

  6. Updating non-EDS Columns • Database $DB: Department ( dno, dname, dphone ) Employee ( ename, dno, salary ) • View $view: for $i in $DB/Employee let $d := $DB/Department[dno=$i/dno] return <employee>{ $i/ename, $i/salary, $d/dname, $d/dphone }</employee> • EDS columns: Employee.ename, Employee.dno, Employee.salary • Update on a non-EDS column: replace $view[ename="Smith"]/dname with "CS" View tree: Employee ename dno salary dno dname dphone Department

  7. Example (cont.) • Update on an non-EDS column: replace $view[ename="Smith"]/dname with "CS" • If the CS department exists, can we just connect Smith to CS? • May cause a side-effect on dphone • What if the dphone were not visible in the view? • What if there is an update replace $view[ename="Smith"]/dphone with "x5678" in the same transaction? ename dno salary Smith update EE x1234 CS x5678 dno dname dphone dno dname dphone

  8. Checking for Side-Effects at Run-Time • An XQuery replace update generates database updates of the form: • replace ( T.A, value, tids ) • to replace T.A with a new value • tids is the tuple id list of the tuple chain to reach the update destination tuple • We do not need to start from a root • anchor tuple: the lowest-level tuple in the chain whose foreign key in the chain is EDS • If T.A is EDS, then tids contains the tuple id of the destination only • At the end of each transaction • We collect all updates associated with the same anchor tuple • We construct a shadow tuple chain that contains the updated values • If there is an existing chain in the database with the same visible tuple values • ... we update the anchor foreign key to link it to the existing chain • A visible column is one that appears in the view output (from usage analysis) Otherwise, we try to insert the tuples of the shadow chain • ... which may cause key constraint violation

  9. Example E must be an EDS column

  10. More on Updates • This translation of a replace update will not cause side-effects • But is this what the programmer intended to do? replace $view[dname="EE"]/dname with "CS" • Too expensive to test at run-time • Deletions are handled similarly • We generate replace calls to change T.A to null • If all visible column values in the shadow chain are null, then disconnect the chain from the anchor • Needs garbage collection • Very hard to handle insertions • Generate replace-null(T.A,value,tids) to replace only if T.A is null • Generate insert(T) to insert a new tuple • Hard to generate these calls from a general XQuery insert

  11. Using Lineage Tracing • The replace/replace-null/insert calls are generated at compile-time • but the tuple chain ids (tids) used must be propagated at run-time • At run-time, every XML atomic value (xs:anyAtomicType) is annotated with lineage attributes • tids: sequence of tuple ids • path: sequence of foreign key names T.A • Requires minor extensions to an existing XQuery engine • Constructed when an atomic value is generated from a table in a view • from the view tree • XQuery operations must propagate these values as is • Newly constructed values have empty lineage

  12. Other Contributions • A general algorithm for translating XQuery updates to embedded SQL update calls • Generates pure XQuery with embedded replace/replace-null/insert calls • It is type-guided (based on the update destination type) • Limitation: XQuery insert translations are not complete • requires that there is at least one child in the parent node to copy parts of its lineage • Each XQuery update query is translated to pure XQuery with • embedded SQL code to retrieve data, • calls to replace/replace-null/insert to update data • A general framework for optimizing XQuery with embedded SQL statements • Source-to-source rewrite heuristics to • promote XQuery predicates to SQL code • fuse SQL statements in pairs to form SQL join queries • An implementation in Haskell:http://lambda.uta.edu/HXQ/

  13. Conclusion • Presented a framework for updating virtual XML views • Characteristics: • Hybrid of schema-based and data-driven methods for detecting view side-effects • Allows general XQueries but restricts the table connections in the views • XQuery updates are translated statically to embedded SQL code • ... but the resulting SQL updates use information that is propagated at run-time (through lineage tracing) • Current work: • A source-to-source, type-guided approach to incremental maintenance of XML views

  14. Related Work • L. Wang et al [vldb'06]: HUX • A hybrid of schema-based and data-driven methods • Uses data lineage • Uses the concept of clean extended source • Semi-decidable method to determine translatability of updates • deduced from relationship cardinalities • Barsalou et al [sigmod'91]: Object view updates • View trees from ownership connections • Literal translation to base updates within a dependency island • Braganholo et al [vldb'04]: Map XML views to relational views • B. Choi et al [icde'07]: data-driven with XPath views • Needs to maintain a reachability matrix • Bidirectionalization research in programming languages • Relational lenses: Bohannon et al [pods'06] • Bidirectionalization for free: Voigtlander [popl'09]

More Related