360 likes | 384 Views
Self Maintenance of materialized XML views with non-cooperative data sources. DBDBD – 2006 Virginie Sans –ETIS/CNRS Laboratory– MIDI Team. Issue and context Pre-requisite The issue Context State of the art Contributions View computation with the XAlgebra
E N D
Self Maintenance of materialized XML views with non-cooperative data sources DBDBD – 2006 Virginie Sans –ETIS/CNRS Laboratory– MIDI Team
Issue and context Pre-requisite The issue Context State of the art Contributions View computation with the XAlgebra Detection and Identification of source updates View maintenance Applications and performances Conclusion Summary
Introduced by WiederHold The architecture mediator wrappers sources Query langague Mediation architecture 1.1 Pre-requisite
Mediator Handle the user request: canonization, atomization Send atomic request to a source via its wrapper wrappers Translate query coming from the mediator into a query in the native langague of the web source Give the mediator an answer in XML Data sources heterogeneous distributed In a web context : Partially unavailable Mediation architecture Meditor Atomic request XML Wrapper SQL Tuples Source SQL 1.1 Pre-requisite
What about views ? Data integration Access control, security Data-warehouses Why ? Interoperability Heterogeneous data Materializing views Fast access to complex query Better Availability Request optimization Views Mediator Materialized views Wrapper Wrapper Wrapper RDB SQL HTML 1.1 Pre-requisite
Issue : View maintenance Maintenance process • Recomputation • Recompute the whole view from scratch When data sources are updated, the view consistency should be kept Maintenance View t+1 View t incremental Maintenance View computation Recomputation • Incremental maintenance • compute changes to view in response to changes to base sources Source t+1 Source t Update 1.2 Issue
Context : semi-structured XML data <bib> <book> <price> 65.95 </price> <title> Advanced Programming in the Unix environment </title> </book> <book> <title> TCP/IP Illustrated </title> </book> <book> <price> 65.95 </price> <title> Advanced Programming in the Unix environment </title> </book> <book> <price>39.95</price> <title> Data on the Web </title> <title> Données sur le Web </title> </book> </bib> • XML views are materialized at the mediator level • Hierarchical data • No scheme, except the query scheme 1.3 Context
Context : XQUERY Syntaxe FLWOR for $var in foret [$var in foret]* let $var:= sous-arbre Where condition Return result • XQuery • Dedicated to XML data • Relational operator (projection, select, join, union, …) • XML operator (tagging, unnesting, aggregation, ..) • FLWOR syntax …………(pronounced Flower !) <result> for $b in document("bib.xml")/bib/book let $a=$b/author where $b/price/text() < 60 Order by $b/year return <cheap_book> $b/title </cheap_book> </result> 1.3 Context
Context : Other specificities • Views are computed using XAlgebra • Cf.View computation • Wrappers have limited resources • Few computation possibilities • A component named logger stores the last modification date and a checksum of sources • Non cooperative web sources • No information about their updates • Not always available • Not enough granularity 1.3 Context
State of the art (1/2) • Relational views • Not fit for semi-structured data • Abiteboul and Al. • OEM (Object Embedded Model) • LOREL language • Some Operators are missing • VOX – Rainbow Team • Need to know the exact position in the XML Tree where the update has been done 1.4 State of the art
State of the art (2/2) • Cobena and Al. • XDiff – an algorithm for XML files comparison • Need a copy of the source at the wrapper level • Bonnet and Al. /Papadimos and Al. • Parachute queries • A mutant query plan What about when sources are really unavailable ? Our goal : Reduce to the minimum sources access Use information that are stored in the view 1.4 State of the art
View maintenance : The process • View computation • An algebraic approach using XAlgebra – Extension of the XAlgebra (identifiers) • Update detection • Comparison of the information of the source and those stored in the logger • Update identification • Recovering process • Diff Algorithm • View maintenance • Propagation rules for each operator 2.1 View computation
View computation Steps : 2.1 View computation
The XAlgebra data model • Operators : • XSource, XConstruct, XUnion, …. • Data structures : • XRelation, XTuple, XAttributes 2.1 View computation
XSource Operator– Step 1 • XQuery analysis For $f in doc("informations.xml")/personnes/personne Let $a:=$f/nom Where $f/age<27 and $a="Durand" Return <nom>{$a}</nom> <prenom>{$f/prenom}</prenom> Path extraction : • Optional • Mandatory • Hidden We obtain : • A context • A set of patterns 2.1 View computation
XSource Operator– Step 2 and 3 • From XML Sub-Trees to the tabular structure 1 Sub Tree => 1 Xtuple XRelation = set of XTuples 2.1 View computation
XSource Operator– Extending the Algebra • adding identifiers : XTids An XTID is a set of pair : {(idsource, idfragment), …..} 2.1 View computation
View computation - XOperator • XProject 2.1 View computation
View computation - XOperator • XJoin XTids propagation : card (XTID)1for some nodes 2.1 View computation
Update detection and Identification • Detection Comparison of the information of the source and those stored in the logger • The last modification date • The checksum of the source • Identification • Partial recovery of the source information based on Xtids • Comparison of the recovered XRelation with the updated source • Δ computation 2.2 Update detection and identification
XRecover • Step 1 : Project XRv on XR1 patterns 2.2 Update detection and identification
XRecover • Step 2 : filtering XTuples values 2.2 Update detection and identification
XRecover • Step 3 : re-ordering XTuples XTidUnnest Xtuples are unnested depending on their XTids 2.2 Update detection and identification
XRecover Step 3 : re-ordering Xtuples XTidnest Xtuples are nested by their Xtids Xtuples are re-ordered 2.2 Update detection and identification
Update Identification – Comparison Algorithm • Comparison of XR1t+1 avec XRt’ • XR1t+1 is the XRelation obtained by applying Xsource to source 1 at t+1 • XRt’ is the partial recovery of Xrelation of source 1 at t Remark :XR1t+1 can also be filtered using predicates before comparison The Diff algorithm is based on Unix Diff (Hunt & McIllroy). The symbol is the Xtuple instead of being the line 2.2 Update detection and identification
Update identification – Diff algorithm • Delta with hunks : • Insert(pos; Xtuple) • delete(pos;Xtuple) • Replace(pos; Xtupleold, Xtuplenew) Insert(2,{Leclerc,Avide,{(1,3)}} {John,Avide,{(1,3)}} } Delete(4,{Durand,Avide,{(1,11)}}, {Marcel,Avide,{(1,11)}} {Eric,Avide,{(1,11)}}} Etc… 2.2 Update detection and identification
Maintenance RulesFrom Delta to view maintenance • Case of a deletion - delete(pos, xtuple) An Xtuple is associated to an Xtid {(x)} such that card=1, Each Xvalue of the view have xtids noted XTID 1) We delete from Xvalues each pair of the Xtid such that x XTID Example : The XTuple where xtid is x=1,3 has been deleted The Xvalue {Alain}1,3;1,4 becomes XValeur {Alain}1,4 2)We delete each Xvalues such that card(XTID)=0 If XValue {Alain}1,3 become XValeur {Alain} We delete entirely the XValue 3) If the Xvalue was concenned by the predicate, we delete the XTuple • Join and restriction case 2.3 View maintenance
Maintenance RulesFrom Delta to view maintenance • Case of an insertion - insert(pos; xtuple) 1) A new Xtid is created Goal : preserved Xtuples order for a later recovery 2) Depending on the operator; we obtain various maintenance instructions Projection: insert of the projection of the xtuple Select : xtuple satisfies the predicat insertion Join XR1* XR2, computation of XT= xtuple * XR2. If XT insertion of XT Union and Intersect: we keep the conservation des doublons Union Select where the predicate is always true Intersect join Depending on the predicate, we can request either XR2 or its recovery 2.3 View maintenance
Maintenance RulesFrom Delta to view maintenance • Case of a modification- Replace(pos; Xtupleold, Xtuplenew) Xtuple modification = Xvalue modification OR Xvalues deletion followed by insertion Project and Union: modification of the concerned XValues Select and Intersect: If modification is applied an Xvalue that must verify the condition, • deletion of the Xtuple Else modification of the XValues Intersect select. Join deletion followed by insertion. 2.3 View maintenance
Maintenance RulesFrom Delta to view maintenance 2.3 View maintenance
Mediator Materialized views Wrapper Wrapper HTML SQL Maintenance rulesMissing Information • Missing Information (join ?) • Source Recovery • Multi-view strategy • Source request Goal : limited acces to the sources !!!! Example : View= S1*S2 Insertio : x * S2’ Computation of S2’ xtuple x is inserted in S1 2.3 View maintenance
Applications • On the web When necessary sources are unavailable Goal : Limited access to them • With sensors (ANR Project ) With sensors that have no wire Goal: Preserve power ressources 2.4 Applications and performances
Performances • Comparison between XRecover and Recomputation 2.4 Applications and performances
Performances • Comparison between XRecover and Recomputation 2.4 Applications and performances
Contributions • Maintenance process in the context of non-cooperative web sources • Contribution to the XAlgebra • New operators : XRecover, XTidUnnest, XTidNest • New data structure : XTids • Futur work • Order sensitive view maintenance • A better Diff algorithm Conclusion