1 / 13

Schema Mapping: Experiences and Lessons Learned

Schema Mapping: Experiences and Lessons Learned. Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF. Schema Mapping. Semantic correspondence between two schemas Significance data integration data warehouses ontology merging message translation in e-commerce

jhorgan
Download Presentation

Schema Mapping: Experiences and Lessons Learned

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF

  2. Schema Mapping • Semantic correspondence between two schemas • Significance • data integration • data warehouses • ontology merging • message translation in e-commerce • semantic query processing • etc.

  3. Schema Representation Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone

  4. 1:1 Mapping Cardinality Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone

  5. n:1 Mapping Cardinality Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone

  6. n:m Mapping Cardinality Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone

  7. Object-Set Matcher (schema-level) • Name-based matcher • string and substring comparison • linguistic methods: stemming, stop words, removing ignorable characters, etc. • thesaurus: WordNet, etc. • 1:1 mapping cardinality Agent agent Name name

  8. Car Model Object-set A Object-set B Ford Honda Chevy Toyota   Object-Set Matcher (instance-level) • Data Frame • multiple regular expressions in Perl style • as simple as a list of data values • Data-frame matcher • use: compare recognized data values • benefit: able to recognize disjunctive data value sets • bias: data frame may not correspond 100% with the semantics • limitation: a needed data frame might not exist • 1:1 mapping cardinality Car Model Ford, Honda, Chevy, Toyota …

  9. 120 N. University Ave., Provo, UT Extended Data-Frame Matcher (instance-level) • n:1 mapping cardinality • Add a STRICT_SUBSTRING operation • With the help of structural analysis Schema 1 location Schema 2 Address Street City State

  10. Direct Structure Matcher • Comparing structure similarity between two candidate schemas • 1:1 mapping cardinality Name agent Agent Fax Location name phone_day fax phone Address

  11. Reference Structure Matcher • If A and B match C, then A matches B. • Able to solve n:m mapping cardinality • 1:1, n:1, and n:m mapping cardinalities Phone Day Phone Cell Phone Evening Phone Office Phone Home Phone Schema 2 Schema 1 Home Phone Evening Phone Cell Phone Day Phone Office Phone

  12. Experiments Indirect Matches: (precision 87%, recall 94%, F-measure 90%) Data borrowed from Univ. of Washington [DDH, SIGMOD01] Rough Comparison with U of W Results * Faculty Member – Accuracy, ~92% * Course Schedule – Accuracy: ~71% * Real Estate (2 tests) – Accuracy: ~75%

  13. Lessons Learned • n:1 and n:m matches occur frequently. • 22% = 97/437 [DMD+03] (Course Catalog, Company Profile) • 45% = 287/638 (Car Ads, Cell Phones, Real Estate) • Reference structures provides a way to solve the long-lasting hard cluster mapping (n:m cardinality) problem. • Data frames improve the instance-level matchers. • The combination of schema-level and instance-level matchers improve the results.

More Related