1 / 25

Data Exchange with  Data-Metadata Translations

Paolo  Papotti Università Roma Tre. Wang-Chiew Tan UC  Santa Cruz. Mauricio A.  Hernández IBM Almaden Research Center. Data Exchange with  Data-Metadata Translations. VLDB 2008. August 24 -- Auckland, New Zealand. Data exchange scenarios may involve metadata transformations.

Download Presentation

Data Exchange with  Data-Metadata Translations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paolo  Papotti Università Roma Tre Wang-Chiew Tan UC  Santa Cruz Mauricio A.  Hernández IBM Almaden Research Center Data Exchange with  Data-Metadata Translations VLDB 2008 August 24 -- Auckland, New Zealand

  2. Data exchange scenarios may involve metadata transformations. E.g., Pivot/Unpivot in spreadsheets. Data-Metadata Translations [example from Miller98] • Mapping systems support Data-to-Data transformations with fixed schemas. • Goal: Extend mapping systems to support Data-Metadata Translations. 2

  3. Mapping Systems GUI Source schema S Target schema T IBM Clio HepTox MS ADO.net Altova MapForce StylusStudio BEA Aqualogic Declarative (internal) representation Executable code (XSLT, XQuery, Java) I J Data exchange

  4. Outline 1. Data and Metadata translations 2. Generation Algorithms Data-to-Data Mapping Generation Metadata-to-Data Query Generation Data-to-Metadata Graphic Design 3. Results & Discussion Experiments Related Work Conclusion Graphic Design

  5. Data-to-Data • Mapping Generation Algorithm: [PVMHF 2002] • Input: Source and Target schemas, and correspondences. • Output: declarative schema mapping • For example: for $s in Source.Sales exists $t in Target.CountrySales, $c in $t.Sales where $t.country = $s.country and $c.style = $s.style and $c.shipdate = $s.shipdate and $c.units = $s.units Source: Rcd Sales: SetOf Rcd country region style shipdate units price Target: Rcd CountrySales: SetOf Rcd country Sales: SetOf Rcd style shipdate units id

  6. Mappings • Query Generation into multiple query languages: • Input: a data to data schema mapping • Output: a query script (XQuery, XSLT, SQL, etc.) for $s in Source.Sales exists $t in Target.CountrySales, $c in $t.Sales where $t.country = $s.country and $c.style = $s.style and $c.shipdate = $s.shipdate and $c.units = $s.units for $x0 in $doc/Source/Sales return ( <CountrySales> <country> { $x0/country/text() } </country> …

  7. m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA “State-of-the-art” Metadata-to-Data How can we transform the following source data into the corresponding target? Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Schema mapping m1 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units “USA” 7

  8. m2: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK “State-of-the-art” Metadata-to-Data How can we transform the following source data into the corresponding target? Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA Schema mapping m2 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units “UK” 8

  9. m2: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK m3: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “Italy” and $t.units = $s.Italy “State-of-the-art” Metadata-to-Data How can we transform the following source data into the corresponding target? Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA Schema mapping m3 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units “Italy” 9

  10. Metadata-to-Data: Our solution MetadatA-Data (MAD) mapping: Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units Target.Sales Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Source.Sales Jan 120 223 89 Feb 83 168 56 countries label value Select the elements to group Placeholder Copy elements’ labels Copy elements’ values for $s in Source.Sales, $c in{“USA”, “UK”, “Italy”} exists $t in Target.Sales where $t.month = $s.month and$t.country = $c and$t.units = $s.($c) Set of labels (strings) Is a label value Dynamic selection of the source element

  11. Data-to-Metadata Now we want to support the opposite operation [example from Miller98] The target schema depends on the source data We define a target template: Nested Dynamic Output Schemas (ndos) Source: Rcd StockTicker: SetOf Rcd time symbol price Target: Rcd Stockquotes: SetOf Rcd time symbols label value Dynamic element Run-time: The dynamic element defines the target instance and the target schema.

  12. Source: Rcd StockTicker: SetOf Rcd time symbol price Target: Rcd Stockquotes: SetOf Rcd time symbols label value Data-to-Metadata: Heterogeneous records Consider this mapping and this source instance: Source Instance There are two possible interpretations for the target ndos: StockTicker (time: 0900, Symbol : MSFT, Price: 27.20 )StockTicker (time: 0900, Symbol : IBM, Price: 120.00 )StockTicker (time: 0905, Symbol : MSFT, Price: 27.30 ) First alternative: Heterogeneous target records Computed Target Schema Computed Target Instance Target: Rcd Stockquotes: SetOf Rcd time symbols:Choice MSFT IBM Stockquotes (time: 0900, MSFT: 27.20 )Stockquotes (time: 0900, IBM: 120.00 )Stockquotes (time: 0905, MSFT: 27.30 )

  13. Source: Rcd StockTicker: SetOf Rcd time symbol price Target: Rcd Stockquotes: SetOf Rcd time symbols label value Data-to-Metadata: Homogenous records Consider this mapping and this source instance: Source Instance There are two possible interpretations for the target ndos: StockTicker (time: 0900, Symbol : MSFT Price: 27.20 )StockTicker (time: 0900, Symbol : IBM Price: 120.00 )StockTicker (time: 0905, Symbol : MSFT Price: 27.30 ) Second alternative: Homogeneous target records Computed Target Schema Computed Target Instance Target: Rcd Stockquotes: SetOf Rcd time MSFT IBM Stockquotes (time: 0900, MSFT: 27.20, IBM: null )Stockquotes (time: 0900, MSFT: null , IBM: 120.00 )Stockquotes (time: 0905, MSFT: 27.30, IBM: null )

  14. Source: Rcd StockTicker: SetOf Rcd time symbol price Target: Rcd Stockquotes: SetOf Rcd time symbols label value Data-to-Metadata: Homogenous records Natural solution for semi-structured data models (XSD, DTD, JSON) Stockquotes(time: 0900, MSFT : 27.20 ) Stockquotes(time: 0900, IBM : 120.00 ) Stockquotes(time: 0905, MSFT : 27.30 ) • Homogeneity Constraint: • “For every pair of tuples t1 and t2, if a is a label in t1, then a is a label in t2” • for $t1 in Target.Stockquotes, $t2 in Target.Stockquotes, • $a in dom ($t1) • exists $a’ in dom ($t2) • where $a = $a’ Stockquotes(time: 0900, MSFT : 27.20, IBM: null )Stockquotes(time: 0900, MSFT : null , IBM: 120.00)Stockquotes(time: 0905, MSFT : 27.30, IBM: null ) Natural solution for the Relational data model

  15. Source: Rcd Sales: SetOf Rcd country region style shipdate units price Target: Rcd ByShipdateCountry: SetOf Choice dates label1 value1 : Rcd countries label2 value2 : SetOfRcd style units price MAD Mapping Generation <ByShipDateCountry> <12-07> <USA> <style>Tee</style><units>11</units><price>1200</price> </USA><USA> <style>Elec.</style><units>12</units><price>3600</price> </USA> </12-07> <01-08> <USA> <style>Tee</style><units>10</units><price>1600</price> </USA> </01-08> <02-08> <UK> <style>Tee</style><units>12</units><price>2000</price> </UK> </02-08> </ByShipDataCountry> Source.Sales country region style shipdate units price USA East Tee 12-07 11 1200 USA East Elec. 12-07 12 3600 USA West Tee 01-08 10 1600 UK West Tee 02-08 12 2000

  16. Source: Rcd Sales: SetOf Rcd country region style shipdate units price Target: Rcd ByShipdateCountry: SetOf Choice dates label1 value1 : Rcd countries label2 value2 : SetOfRcd style units price MAD Mapping Generation • Three Steps: • Modify schemas with dynamic placeholders • Compile mappings • Simplify mapping This is what we get from Clio [PVMHF 02] for $s in Source.Sales exists $t in Target.ByShipdateCountry, $y indates, $u incase $t of $y, $z incountries, $v in $u.($z) where $y = $s.shipdate and $z= $s.country and $v.style = $s.style and $v.units = $s.units and $v.price = $s.price and $u.($z) = SK[$s.shipdate,$s.country] for $s in Source.Sales exists $t in Target.ByShipdateCountry, $u incase $t of $s.shipdate, $v in $u.($s.country) where $v.style = $s.style and $v.units = $s.units and $v.price = $s.price and $u.($s.country) = SK[$s.shipdate,$s.country]

  17. Phase 1: Q1 shreds the source instance I over relational views of the target schema S1 S conforms-to Q1 I Query Generation: two-phase algorithm [PVMHF 02] Phase 2: Q2 assembles the target instance J from the relational views T2 T1 T T3 T4 conforms-to r r J r r Q2

  18. Q1 New Query Generation Phase 2: Q2 assembles the target instance J from the relational views Q3 computes the target schema T Q4 is the optional post - processing Phase 1: Q1 shreds the source instance I over relational views of the target ndos T2 T1 ndos T3 conforms-to T2 T1 S1 T S T3 T4 conforms-to conforms-to Q3 r I r J r r Q2 Q4

  19. MAD Clio vs. Commercial Tools Commercial Tool

  20. MAD Clio vs. Commercial Tools MAD Clio Optimized query 48 source labels (10 MB): naïve 183 s, dynamic 14 s, optimized 10 s

  21. 12 target labels (10 MB): naïve 590 s, optimized 80 s [1 phase: 3 s] MAD Clio Performance

  22. Lots of related work in the relational setting: FIRA/FISQL [Wyss,Robertson 2005] has an excellent survey. SchemaSQL [Lakshmanan,Sadri,Subramanian 1996], FIRA/FISQL [Wyss,Robertson 2005] Extensions to SQL to handle metadata as data Only relational dynamic output schemas Language and semantics, NO transformations from GUI In XML settings HepTox [BCHLP 2005], commercial mapping tools [Altova MapForce, MS ADO.net, StylusStudio, BEA (Oracle) Aqualogic] No dynamic elements in the target Some Related Work

  23. MAD Clio Data exchange with data-metadata support: Data to Data is a special case • New construct to iterate over elements’ labels: placeholder • Target schema can be incomplete: nested dynamic output schema (ndos) GUI Source schema S Target schema T • New constructs for the mapping language • New mapping & query generation algorithms • Including a query to generate the target schema. Declarative (internal) representation Executable code (XSLT, XQuery, Java)

  24. Thank you. Questions? Data Exchange with  Data-Metadata Translations

  25. Metadata-to-Metadata ... <properties name=“price” lang=“en-us” date=“01-01-2008” ... > <pval>48.15</pval> </properties> ... ... <price value=“48.15” lang=“en-us” date=“01-01-2008” ... /> ... Source: Rcd properties: SetOf Rcd @name @lang @date … @format pval Target: Rcd label1 value1: SetOf Rcd @value label2 value2 <<names>> <<attrs>> label value <<elems>> for $x1in Source.properties, $x2in { @lang, @date, …, @format } exists $y1in Target.($x1.@name), where $y1.@value = $x1.pval and $y1.($x2) = $x1.($x2) Metadata to Metadata: placeholder to dynamic element

More Related