1 / 18

Discovering Direct and Indirect Matches for Schema Elements

Discovering Direct and Indirect Matches for Schema Elements. Li Xu Data Extraction Group Brigham Young University Sponsored by NSF. Car. Mileage. Miles. Target. Problem. Color. Year. Year. Make. Feature. Make & Model. Body Type. Car. Cost. Model. Car. Car. Style. Phone.

inara
Download Presentation

Discovering Direct and Indirect Matches for Schema Elements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF

  2. Car Mileage Miles Target Problem Color Year Year Make Feature Make & Model Body Type Car Cost Model Car Car Style Phone Cost Source

  3. Applications • Data Integration • Schema Integration • Message Mapping • Data Translation

  4. Approach • Direct Matches • Indirect Matches • Union • Selection • Composition • Decomposition

  5. Car Mileage Miles Target Union and Selection Color Year Year Make Feature Make & Model Body Type Car Cost Model Car Car Style Phone Cost Source

  6. Car Mileage Miles Target Composition and Decomposition Color Year Year Make Feature Make & Model Body Type Car Cost Model Car Car Style Phone Cost Source

  7. Matching Techniques • Terminological Relationships • Value Characteristics • Expected Data Values • Structure

  8. Terminological Relationships • WordNet • Machine-Learned Rules • Example: (Make, Brand) The number of different common hypernym roots of A and B The sum of the number of senses of A and B Sum of distances of A and B to a common hypernym

  9. Value Characteristics • Machine Learning • Features [LC94] • String length, numeric ratio, space ratio.

  10. Expected Values • Application Concepts • Data Recognizers • CarMake • “ford” • “honda” • … • CarModel • “accord” • “mustang” • “taurus” • … Make & Model Brand Model FordMustang FordTaurus Ford F150 … Acura Audi BMW … Legend Mustang A4 … CarMake . CarModel CarMake CarModel Target Source

  11. Structure PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo Count Address ItemCount Item City Street City Street Item ItemNumber City Street Line Qty UoM Quantity UnitOfMeasure Target Source

  12. Structure (Cont.) PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo DeliverTo Count Address Count Item City Street City Street Item ItemNumber City Street Line Qty UoM Quantity UnitOfMeasure Target Source

  13. Structure (Cont.) PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo City Count City Count Item Street City Street City Street Item Street ItemNumber Line Qty UoM Quantity UnitOfMeasure Target Source

  14. Structure (Cont.) PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo City Count City Count Item Street City Street City Street Item Street ItemNumber ItemNumber Line Line Line Qty Qty Qty UoM UoM Quantity Quantity Quantity UnitOfMeasure Target Source

  15. Structure (Cont.) PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo City City Count Count City City Count Count Item Street Street City City Street Street City City Street Street Item Street Street ItemNumber Line Line Qty Qty UoM Quantity Quantity UnitOfMeasure Target Source

  16. Experiments • Methodology • Measures • Precision • Recall • F Measure

  17. Results Indirect Matches: 94% (precision, recall, F-measure) Data borrowed from Univ. of Washington

  18. Contributions • Direct Matches • Indirect Matches • Expected values • Structure • High Precision and High Recall

More Related