1 / 21

Efficient XML Integration Wrapping Queries

This article discusses the challenges and strategies for efficient XML integration by wrapping queries in middleware servers, ODBC servers, and more.

bwithers
Download Presentation

Efficient XML Integration Wrapping Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Wrapping Query Languages and Efficient XML IntegrationV. Christophides, S. Cluet, J Simeon Computer Science Department, University of CreteInstitute for Computer Science - FORTHHeraklion, CreteINRIA RocquencourtDomaine de Voluceau Paris, FranceBell Laboratories Murray Hill, NJ, USA

  2. An Integration Scenario Middleware Server ODBC Server Z39.50 Server SQL queries on tables with trading info about artifacts Full-text queries on well-formed XML docs with descriptive info about artifacts

  3. <work> <artist>Monet</artist> <title>Nympheas</title> <style>Impressionism</style<size>21 x 61</size> <crplace>Giverny</crplace> </work> > XML-based Middleware is Cool ! <artwork> <artifactid=‘a1’> <artist>Monet</artist> <title>Nympheas</title> <price>10M$</price> <style>Impressionism</style> <dims>21 x 61</dims> <crplace>Giverny</crplace> </artifact> </artwork> What are the Artifacts created in Giverny ? Q XML V1=(Q1,Q2) V2= ... Q2 Q1 Middleware Server RDBMS-XML Wrapper XML XML Wais-XML Wrapper ODBC Server Z39.50 Server Title Creator Price Nympheas Monet 10M$ Waitress Manet 38M$ S2 S1

  4. But XML is not a Panacea !!! • Wrapping queries is hard • Optimization for XML queries is poor • What about type information? Q XML Q=Q’(Q1’,Q2’) Q1’ Middleware Server Q2’ SQL-XML Wrapper Full Text-XML Wrapper ODBC Server Z39.50 Server select ... from ... where ... containsword1 or/and … S2 S1

  5. Q2’ Q2’ Q1’ Q1’ The YAT Approach to Efficient XML Integration • An Algebra for XML • Generic wrapping of query languages • New optimization opportunities Q XML Q’ YAT Mediator Server Generic Wrapper Generic Wrapper ODBC Server Z39.50 Server S2 S1

  6. Outline • Brief Recall • YAT data model (wrappers’ structural metadata) • YATL integration language (XML view definition) • The YAT operational model • XML Algebra • Generic wrapping of source query capabilities • Wrappers’ operational metadata • Optimization opportunities • Summary and Related work

  7. Work: Field: root works: work docs Symbol v Symbol * * artist title style dims String Work Field Field XML Artwork String String String String Artifacts Schema Rel_artifacts: table rel_artifacts * tuple owner price creator title String Float String String Wrapping of Source Data Structures Relational model Relation: table Symbol * tuple * Symbol Int v String v Float v Bool

  8. Integrating Heterogeneous XML Data with YATL MAKE collection *Artifact($t,$a):= artifact [title:$t, artist:$a, price:$p, style:$s, dims:$d, owner:$o, misc:$f] MATCHrel_artifactsWITH table * tuple * { title:$t, creator:$c, price:$p,owner:$o } worksWITH works * work [ artist:$a, title:$t’, style:$s, dims:$d, *($f) ] WHERE $t = $t’ and $c = $a

  9. The XML Algebra • What do we need ? • capture the query language • support optimization • wrap source query languages • Our XML algebra • relational operators: Select, Project, Join, , , • core object operators: Map, Djoin, Group, Sort • Standard Relational & Object Rewritings • two XML operators: Bind and Tree • New XML Rewritings  B p

  10. Bind Operator & Tab Structure Bind Tab docs ... * work $s $s $t $d $f $a $t $d $f $a artist title style dims *($f) theme crplace 21x61 37.5x51 Nympheas Manet Monet Waitress $s $d Impressionism $a $t Impressionism “Giverny” “Folies Bergere” works

  11. Tree & Restructuring Tree $s * Style($s): $s s1: “Cubism” s2: “Impressionism” ... * $a ... ... Pablo Picasso Georges Braque Edouard Manet Claude Monet Bind(works, …)

  12. Algebraization of Queries Tree artwork:= collection * Artifact($t,$a):= artifact price style dims misc artist title $f $a $d $t $p $s Join $t = $t’ and$c = $a Bind Bind table docs * rel_artifacts work * tuple artist title style dims *($f) price owner creator title $t’ $c $a $s $d $t $p $o rel_artifacts works

  13. Generic Wrapping of Source Query Capabilities Operation Function Algebra ... Bind Select Predicate Group ... ... Join Tree Supported by: {YAT} Sig: Yat x FYat  YAT Supported by: {YAT} Sig: Yat x FYat  Tab Supported by: {YAT,Rel,Wais} Sig: Tab x Pred  Tab Basic ... < = ... Supported by: {Rel} Sig: Rel x FRel  Tab Supported by: {Wais} Sig: Works x FWork  Tab Supported by: {YAT,Rel} Sig: Yat x Yat  Bool contains Extension Supported by: {Wais} Sig: String x Work  Bool Refinement

  14. Query Processing in YAT • Query: What are the artifacts created in Giverny and sold for less that 10M$? • Three phases query optimization: • Simplification of algebraic expressions: Bind-Tree rewritings, push selections, projections, ... • Pushing operations on external sources: filter simplification, source-supplied equivalencies, ... • Information passing between sources: reorder join arguments, ... MAKE*answer [title: $t, artist: $a, price: $p] MATCH artworkWITH collection * artifact [title: $t, artist: $a, price: $p, misc.crplace: $cp] WHERE $cp = “Giverny” and $p < 10

  15. Query Preprocessing Tree Query * answer title artist price $t $a $p Select $cp=“Giverny” and $p<10 Bind collection * artifact title artist price misc crplace $t $a $p $cp Tree View artwork collection * artifact Artifact($t,$a) dims owner misc price style artist title $f Join $a $d $o $t $p $s Bind Bind $t = $t’ and$c = $a table docs rel_artifacts * * work tuple artist style dims title *($f) price owner creator title $t $c $a $s $d $t $o $p rel_artifacts works

  16. Query Optimization: Phase 1 Bind * t a p m crplace Bind * $t $a $p $cp artifact Bind collection artist title price misc * artifact Project crplace artist title price misc $t $a $p $t,$a, $p,$m:f $cp $m Tree $t $a $p artwork collection * artifact Artifact($t,$a) dims owner misc price style artist title $f $a $d $o $t $p $s

  17. Query Optimization: Phase 1 Tree * answer title artist price $t $a $p Join $t = $t’ and$c = $a Select Select $cp=“Giverny” $p<10 Bind * Project m $t, $c, $p crplace Bind $cp table Project rel_artifacts $t, $a, $m:f * tuple Bind price owner creator title docs * work $c style dims title artist *($f) $t $o $p rel_artifacts $t $a $s $d works

  18. Query Optimization: Phase 2 Tree * answer title artist price $t $a $p Join $t = $t’ and$c = $a Select Select $cp=“Giverny” $p<10 Bind * Bind table w rel_artifacts * tuple title artist crplace price creator title $cp $t $a $c $t $p Select rel_artists contains(“Giverny”,$w) Bind docs * work = (X,Work) => contains(X,Work) $w works

  19. Query Optimization: Phase 3 Tree * answer title artist price $t $a $p DJoin Select Select $cp=“Giverny” Bind $p<10 * w Bind table rel_artifacts title artist crplace * tuple price creator title $cp $t $a Select a t $p rel_artists contains(“Giverny”,$w) Bind docs * work $w works

  20. Summary & Related Work • Wrapping Query Languages: implies to understand the semantics of QLs • Ad hoc solution proposed by Garlic (IBM) • Untyped solutionproposed by DISCO (INRIA) • Query templates-based solution proposed by TSIMMIS (Stanford) • Generic solution introduced for the YAT system (INRIA+Bell Labs) • XML query optimization: requires to exploit XML typing information • YAT relies on an general purpose algebra allowing • to reuse optimization techniques proposed in the relational and object context (pushing selections, projections, join reordering, …) • to introduce new ones taking advantage of the type information in order to prune navigation in XML trees, push query evaluation to the sources, etc.

  21. Source The YAT Architecture M E D I A T O R Server YAT API Data Information Module Query Module Optimizer View Interface Evaluator Structural Information YAT API Client W R A P P E R Server YAT API Data Conversion Query Translation Structural Extraction Operational Extraction Client Data Information Module Query Module

More Related