210 likes | 226 Views
This article discusses the challenges and strategies for efficient XML integration by wrapping queries in middleware servers, ODBC servers, and more.
E N D
On Wrapping Query Languages and Efficient XML IntegrationV. Christophides, S. Cluet, J Simeon Computer Science Department, University of CreteInstitute for Computer Science - FORTHHeraklion, CreteINRIA RocquencourtDomaine de Voluceau Paris, FranceBell Laboratories Murray Hill, NJ, USA
An Integration Scenario Middleware Server ODBC Server Z39.50 Server SQL queries on tables with trading info about artifacts Full-text queries on well-formed XML docs with descriptive info about artifacts
<work> <artist>Monet</artist> <title>Nympheas</title> <style>Impressionism</style<size>21 x 61</size> <crplace>Giverny</crplace> </work> > XML-based Middleware is Cool ! <artwork> <artifactid=‘a1’> <artist>Monet</artist> <title>Nympheas</title> <price>10M$</price> <style>Impressionism</style> <dims>21 x 61</dims> <crplace>Giverny</crplace> </artifact> </artwork> What are the Artifacts created in Giverny ? Q XML V1=(Q1,Q2) V2= ... Q2 Q1 Middleware Server RDBMS-XML Wrapper XML XML Wais-XML Wrapper ODBC Server Z39.50 Server Title Creator Price Nympheas Monet 10M$ Waitress Manet 38M$ S2 S1
But XML is not a Panacea !!! • Wrapping queries is hard • Optimization for XML queries is poor • What about type information? Q XML Q=Q’(Q1’,Q2’) Q1’ Middleware Server Q2’ SQL-XML Wrapper Full Text-XML Wrapper ODBC Server Z39.50 Server select ... from ... where ... containsword1 or/and … S2 S1
Q2’ Q2’ Q1’ Q1’ The YAT Approach to Efficient XML Integration • An Algebra for XML • Generic wrapping of query languages • New optimization opportunities Q XML Q’ YAT Mediator Server Generic Wrapper Generic Wrapper ODBC Server Z39.50 Server S2 S1
Outline • Brief Recall • YAT data model (wrappers’ structural metadata) • YATL integration language (XML view definition) • The YAT operational model • XML Algebra • Generic wrapping of source query capabilities • Wrappers’ operational metadata • Optimization opportunities • Summary and Related work
Work: Field: root works: work docs Symbol v Symbol * * artist title style dims String Work Field Field XML Artwork String String String String Artifacts Schema Rel_artifacts: table rel_artifacts * tuple owner price creator title String Float String String Wrapping of Source Data Structures Relational model Relation: table Symbol * tuple * Symbol Int v String v Float v Bool
Integrating Heterogeneous XML Data with YATL MAKE collection *Artifact($t,$a):= artifact [title:$t, artist:$a, price:$p, style:$s, dims:$d, owner:$o, misc:$f] MATCHrel_artifactsWITH table * tuple * { title:$t, creator:$c, price:$p,owner:$o } worksWITH works * work [ artist:$a, title:$t’, style:$s, dims:$d, *($f) ] WHERE $t = $t’ and $c = $a
The XML Algebra • What do we need ? • capture the query language • support optimization • wrap source query languages • Our XML algebra • relational operators: Select, Project, Join, , , • core object operators: Map, Djoin, Group, Sort • Standard Relational & Object Rewritings • two XML operators: Bind and Tree • New XML Rewritings B p
Bind Operator & Tab Structure Bind Tab docs ... * work $s $s $t $d $f $a $t $d $f $a artist title style dims *($f) theme crplace 21x61 37.5x51 Nympheas Manet Monet Waitress $s $d Impressionism $a $t Impressionism “Giverny” “Folies Bergere” works
Tree & Restructuring Tree $s * Style($s): $s s1: “Cubism” s2: “Impressionism” ... * $a ... ... Pablo Picasso Georges Braque Edouard Manet Claude Monet Bind(works, …)
Algebraization of Queries Tree artwork:= collection * Artifact($t,$a):= artifact price style dims misc artist title $f $a $d $t $p $s Join $t = $t’ and$c = $a Bind Bind table docs * rel_artifacts work * tuple artist title style dims *($f) price owner creator title $t’ $c $a $s $d $t $p $o rel_artifacts works
Generic Wrapping of Source Query Capabilities Operation Function Algebra ... Bind Select Predicate Group ... ... Join Tree Supported by: {YAT} Sig: Yat x FYat YAT Supported by: {YAT} Sig: Yat x FYat Tab Supported by: {YAT,Rel,Wais} Sig: Tab x Pred Tab Basic ... < = ... Supported by: {Rel} Sig: Rel x FRel Tab Supported by: {Wais} Sig: Works x FWork Tab Supported by: {YAT,Rel} Sig: Yat x Yat Bool contains Extension Supported by: {Wais} Sig: String x Work Bool Refinement
Query Processing in YAT • Query: What are the artifacts created in Giverny and sold for less that 10M$? • Three phases query optimization: • Simplification of algebraic expressions: Bind-Tree rewritings, push selections, projections, ... • Pushing operations on external sources: filter simplification, source-supplied equivalencies, ... • Information passing between sources: reorder join arguments, ... MAKE*answer [title: $t, artist: $a, price: $p] MATCH artworkWITH collection * artifact [title: $t, artist: $a, price: $p, misc.crplace: $cp] WHERE $cp = “Giverny” and $p < 10
Query Preprocessing Tree Query * answer title artist price $t $a $p Select $cp=“Giverny” and $p<10 Bind collection * artifact title artist price misc crplace $t $a $p $cp Tree View artwork collection * artifact Artifact($t,$a) dims owner misc price style artist title $f Join $a $d $o $t $p $s Bind Bind $t = $t’ and$c = $a table docs rel_artifacts * * work tuple artist style dims title *($f) price owner creator title $t $c $a $s $d $t $o $p rel_artifacts works
Query Optimization: Phase 1 Bind * t a p m crplace Bind * $t $a $p $cp artifact Bind collection artist title price misc * artifact Project crplace artist title price misc $t $a $p $t,$a, $p,$m:f $cp $m Tree $t $a $p artwork collection * artifact Artifact($t,$a) dims owner misc price style artist title $f $a $d $o $t $p $s
Query Optimization: Phase 1 Tree * answer title artist price $t $a $p Join $t = $t’ and$c = $a Select Select $cp=“Giverny” $p<10 Bind * Project m $t, $c, $p crplace Bind $cp table Project rel_artifacts $t, $a, $m:f * tuple Bind price owner creator title docs * work $c style dims title artist *($f) $t $o $p rel_artifacts $t $a $s $d works
Query Optimization: Phase 2 Tree * answer title artist price $t $a $p Join $t = $t’ and$c = $a Select Select $cp=“Giverny” $p<10 Bind * Bind table w rel_artifacts * tuple title artist crplace price creator title $cp $t $a $c $t $p Select rel_artists contains(“Giverny”,$w) Bind docs * work = (X,Work) => contains(X,Work) $w works
Query Optimization: Phase 3 Tree * answer title artist price $t $a $p DJoin Select Select $cp=“Giverny” Bind $p<10 * w Bind table rel_artifacts title artist crplace * tuple price creator title $cp $t $a Select a t $p rel_artists contains(“Giverny”,$w) Bind docs * work $w works
Summary & Related Work • Wrapping Query Languages: implies to understand the semantics of QLs • Ad hoc solution proposed by Garlic (IBM) • Untyped solutionproposed by DISCO (INRIA) • Query templates-based solution proposed by TSIMMIS (Stanford) • Generic solution introduced for the YAT system (INRIA+Bell Labs) • XML query optimization: requires to exploit XML typing information • YAT relies on an general purpose algebra allowing • to reuse optimization techniques proposed in the relational and object context (pushing selections, projections, join reordering, …) • to introduce new ones taking advantage of the type information in order to prune navigation in XML trees, push query evaluation to the sources, etc.
Source The YAT Architecture M E D I A T O R Server YAT API Data Information Module Query Module Optimizer View Interface Evaluator Structural Information YAT API Client W R A P P E R Server YAT API Data Conversion Query Translation Structural Extraction Operational Extraction Client Data Information Module Query Module