130 likes | 143 Views
Schema Mapping: Experiences and Lessons Learned. Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF. Schema Mapping. Semantic correspondence between two schemas Significance data integration data warehouses ontology merging message translation in e-commerce
E N D
Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF
Schema Mapping • Semantic correspondence between two schemas • Significance • data integration • data warehouses • ontology merging • message translation in e-commerce • semantic query processing • etc.
Schema Representation Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone
1:1 Mapping Cardinality Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone
n:1 Mapping Cardinality Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone
n:m Mapping Cardinality Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone
Object-Set Matcher (schema-level) • Name-based matcher • string and substring comparison • linguistic methods: stemming, stop words, removing ignorable characters, etc. • thesaurus: WordNet, etc. • 1:1 mapping cardinality Agent agent Name name
Car Model Object-set A Object-set B Ford Honda Chevy Toyota Object-Set Matcher (instance-level) • Data Frame • multiple regular expressions in Perl style • as simple as a list of data values • Data-frame matcher • use: compare recognized data values • benefit: able to recognize disjunctive data value sets • bias: data frame may not correspond 100% with the semantics • limitation: a needed data frame might not exist • 1:1 mapping cardinality Car Model Ford, Honda, Chevy, Toyota …
120 N. University Ave., Provo, UT Extended Data-Frame Matcher (instance-level) • n:1 mapping cardinality • Add a STRICT_SUBSTRING operation • With the help of structural analysis Schema 1 location Schema 2 Address Street City State
Direct Structure Matcher • Comparing structure similarity between two candidate schemas • 1:1 mapping cardinality Name agent Agent Fax Location name phone_day fax phone Address
Reference Structure Matcher • If A and B match C, then A matches B. • Able to solve n:m mapping cardinality • 1:1, n:1, and n:m mapping cardinalities Phone Day Phone Cell Phone Evening Phone Office Phone Home Phone Schema 2 Schema 1 Home Phone Evening Phone Cell Phone Day Phone Office Phone
Experiments Indirect Matches: (precision 87%, recall 94%, F-measure 90%) Data borrowed from Univ. of Washington [DDH, SIGMOD01] Rough Comparison with U of W Results * Faculty Member – Accuracy, ~92% * Course Schedule – Accuracy: ~71% * Real Estate (2 tests) – Accuracy: ~75%
Lessons Learned • n:1 and n:m matches occur frequently. • 22% = 97/437 [DMD+03] (Course Catalog, Company Profile) • 45% = 287/638 (Car Ads, Cell Phones, Real Estate) • Reference structures provides a way to solve the long-lasting hard cluster mapping (n:m cardinality) problem. • Data frames improve the instance-level matchers. • The combination of schema-level and instance-level matchers improve the results.