130 likes | 153 Views
Explore the significance of semantic correspondence between schemas and lessons learned in data extraction, integration, and ontology merging. Learn about techniques like message translation in e-commerce and semantic query processing. Understand the use of schema representation, object-set matching, and various cardinality mappings.
E N D
Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF
Schema Mapping • Semantic correspondence between two schemas • Significance • data integration • data warehouses • ontology merging • message translation in e-commerce • semantic query processing • etc.
Schema Representation Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone
1:1 Mapping Cardinality Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone
n:1 Mapping Cardinality Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone
n:m Mapping Cardinality Phone_evening MLS MLS Bedrooms Basic_features location House Agent SQFT beds Name location_ description agent Phone_day Golf course Water front Location Address name cell phone Street City State home phone office phone
Object-Set Matcher (schema-level) • Name-based matcher • string and substring comparison • linguistic methods: stemming, stop words, removing ignorable characters, etc. • thesaurus: WordNet, etc. • 1:1 mapping cardinality Agent agent Name name
Car Model Object-set A Object-set B Ford Honda Chevy Toyota Object-Set Matcher (instance-level) • Data Frame • multiple regular expressions in Perl style • as simple as a list of data values • Data-frame matcher • use: compare recognized data values • benefit: able to recognize disjunctive data value sets • bias: data frame may not correspond 100% with the semantics • limitation: a needed data frame might not exist • 1:1 mapping cardinality Car Model Ford, Honda, Chevy, Toyota …
120 N. University Ave., Provo, UT Extended Data-Frame Matcher (instance-level) • n:1 mapping cardinality • Add a STRICT_SUBSTRING operation • With the help of structural analysis Schema 1 location Schema 2 Address Street City State
Direct Structure Matcher • Comparing structure similarity between two candidate schemas • 1:1 mapping cardinality Name agent Agent Fax Location name phone_day fax phone Address
Reference Structure Matcher • If A and B match C, then A matches B. • Able to solve n:m mapping cardinality • 1:1, n:1, and n:m mapping cardinalities Phone Day Phone Cell Phone Evening Phone Office Phone Home Phone Schema 2 Schema 1 Home Phone Evening Phone Cell Phone Day Phone Office Phone
Experiments Indirect Matches: (precision 87%, recall 94%, F-measure 90%) Data borrowed from Univ. of Washington [DDH, SIGMOD01] Rough Comparison with U of W Results * Faculty Member – Accuracy, ~92% * Course Schedule – Accuracy: ~71% * Real Estate (2 tests) – Accuracy: ~75%
Lessons Learned • n:1 and n:m matches occur frequently. • 22% = 97/437 [DMD+03] (Course Catalog, Company Profile) • 45% = 287/638 (Car Ads, Cell Phones, Real Estate) • Reference structures provides a way to solve the long-lasting hard cluster mapping (n:m cardinality) problem. • Data frames improve the instance-level matchers. • The combination of schema-level and instance-level matchers improve the results.