180 likes | 273 Views
Discovering Direct and Indirect Matches for Schema Elements. Li Xu Data Extraction Group Brigham Young University Sponsored by NSF. Car. Mileage. Miles. Target. Problem. Color. Year. Year. Make. Feature. Make & Model. Body Type. Car. Cost. Model. Car. Car. Style. Phone.
E N D
Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF
Car Mileage Miles Target Problem Color Year Year Make Feature Make & Model Body Type Car Cost Model Car Car Style Phone Cost Source
Applications • Data Integration • Schema Integration • Message Mapping • Data Translation
Approach • Direct Matches • Indirect Matches • Union • Selection • Composition • Decomposition
Car Mileage Miles Target Union and Selection Color Year Year Make Feature Make & Model Body Type Car Cost Model Car Car Style Phone Cost Source
Car Mileage Miles Target Composition and Decomposition Color Year Year Make Feature Make & Model Body Type Car Cost Model Car Car Style Phone Cost Source
Matching Techniques • Terminological Relationships • Value Characteristics • Expected Data Values • Structure
Terminological Relationships • WordNet • Machine-Learned Rules • Example: (Make, Brand) The number of different common hypernym roots of A and B The sum of the number of senses of A and B Sum of distances of A and B to a common hypernym
Value Characteristics • Machine Learning • Features [LC94] • String length, numeric ratio, space ratio.
Expected Values • Application Concepts • Data Recognizers • CarMake • “ford” • “honda” • … • CarModel • “accord” • “mustang” • “taurus” • … Make & Model Brand Model FordMustang FordTaurus Ford F150 … Acura Audi BMW … Legend Mustang A4 … CarMake . CarModel CarMake CarModel Target Source
Structure PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo Count Address ItemCount Item City Street City Street Item ItemNumber City Street Line Qty UoM Quantity UnitOfMeasure Target Source
Structure (Cont.) PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo DeliverTo Count Address Count Item City Street City Street Item ItemNumber City Street Line Qty UoM Quantity UnitOfMeasure Target Source
Structure (Cont.) PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo City Count City Count Item Street City Street City Street Item Street ItemNumber Line Qty UoM Quantity UnitOfMeasure Target Source
Structure (Cont.) PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo City Count City Count Item Street City Street City Street Item Street ItemNumber ItemNumber Line Line Line Qty Qty Qty UoM UoM Quantity Quantity Quantity UnitOfMeasure Target Source
Structure (Cont.) PO PurchaseOrder Items POShipTo POBillTo POLines InvoiceTo DeliverTo City City Count Count City City Count Count Item Street Street City City Street Street City City Street Street Item Street Street ItemNumber Line Line Qty Qty UoM Quantity Quantity UnitOfMeasure Target Source
Experiments • Methodology • Measures • Precision • Recall • F Measure
Results Indirect Matches: 94% (precision, recall, F-measure) Data borrowed from Univ. of Washington
Contributions • Direct Matches • Indirect Matches • Expected values • Structure • High Precision and High Recall