1 / 17

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration

This article explores techniques for attribute matching using various facets of metadata, including data values, data-dictionary information, structural properties, ontologies, and terminological relationships. It presents a framework for combining multiple facets and iterative matching to achieve accurate attribute matches. The approach is demonstrated using a car attribute matching example.

louisscott
Download Presentation

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman

  2. Background • Problem : Attribute matching • Techniques • Data values • Data-dictionary information • Structural properties • Ontologies • Terminological relationships

  3. Approach • Target Schema T • Source Schema S • Framework • Individual Facet Matching; • Combining Multiple Facets; • Iteration.

  4. Year Year Year Year Make Make Make Feature Make has has has has has 0:1 0:1 0:1 0:1 0:* 0:1 0:1 Car Cost Model Model Model Car Model has has 0:1 has 0:1 has Phone Mileage Miles Example Car Car Style 0:1 has 0:* 0:1 0:1 has has has Mileage Miles Cost Target Schema T Source Schema S

  5. Individual Facet Matching • Terminological relationships • Data value characteristics • Target-specific, regular-expression matches

  6. Terminological Relationships • Names of Attributes • T : A • S : B • WordNet • C4.5 Decision Tree • Feature selection • f0: Same word • f1: Synonym • f2: Sum of the distances of A and B to a common hypernym root • f3: Number of different common hypernym roots of A and B • f4: Sum of the number of senses of A and B

  7. The number of different common hypernym roots of A and B The sum of the number of senses of A and B Sum of distances of A and B to a common hypernym WordNet Rule

  8. WordNet Confidences

  9. Data-Value Characteristics • C4.5 Decision Tree • Features [LC94] • Numeric data • Mean, variation, coefficient variation, standard deviation; • Alphanumeric data • String length, numeric ratio, space ratio.

  10. Value-Characteristics Confidences

  11. Expected Data Values • Target Schema T • Data frame • Source Schema S • Data instances • Hit Ratio = N’/N (A, B) • N’ : number of B data instances consistent with specifications of A data frame; • N: number of B data instances.

  12. Expected-Values Confidences

  13. 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Combined Confidences Threshold: 0.5

  14. Final Confidences

  15. F1 93.75% F2 84% F3 92% F1 98.9% F2 97.9% F3 98.4% Experimental Results • Matched Attributes • 100% (32 of 32); • Unmatched Attributes • 99.5% (374 of 376); • “Feature” ---”Color”; • “Feature” ---”Body Type”.

  16. Future Work • Additional facets of metadata • More sophisticated combinations • Additional application domains • Automating feature selection

  17. Questions?

More Related