170 likes | 246 Views
Using Schema Matching to Simplify Heterogeneous Data Translation. Tova Milo, Sagit Zohar Tel Aviv University. Introduction. There are large amounts of data available on the Web but the format of the data is not homogeneous. Most applications can handle only one or a small number of formats.
E N D
Using Schema Matching to Simplify Heterogeneous Data Translation Tova Milo, Sagit Zohar Tel Aviv University
Introduction • There are large amounts of data available on the Web but the format of the data is not homogeneous. • Most applications can handle only one or a small number of formats. • There is a need to translate data from one format to another.
Introduction • Two approaches to translating data: • A specific program to translate from format A to format B. (e.g. Latex to HTML) • Data translation languages.
Introduction • The solution – TranScm • A data translation system • Automatically translates a portion (often a large portion) of the desired data • Does not replace data translation languages, but reduces the amount of programming needed in them
TranScm Architecture Input Schema Output Schema Import/Export Library GUI Rule Base Matching Module Typing Module
Data Model • Tree (Forest) Model • Similar to OEM • Allows an order on children • Can handle cyclic structures using ids as “pointers”
Data Model Article title authors sections author author “Conceptual Concepts” “Al Gore Ithm” “G WWW Bush”
Schema Model • Labeled graphs • Some nodes may be ordered • Each vertex is a schema element (type) • Labels carry information about the node
Schema Model Article [3] author [1] string sections [2] title [1] authors [0,…,->] ref string
Rules • Rules are the basis of the matching and translation • Rules have an associated priority
Rules • Each rule has two components: • Matching component • Match function • Decendents (sic) function • Translation component • Translation function
Matching • The Match function examines schema labels to determine possible matches. • The Decendents function checks the numbers and types of the children of the current node.
Matching Article Article authors author author author author
When Matching Fails • Matching can fail for two reasons: • Something in the source can’t be matched to something in the target with the current set of rules. • Something in the source matches several items in the target equally well.
When Matching Fails • Via the GUI, the user can do the following: • Add • Disable • Modify • Override
Translation • Using the mapping generated from the Matching step and the appropriate rules, data is transformed from the input schema to the output schema. • The translation process can make use of data translation languages • The translation process can perform type checking.
Conclusion • TranScm • Provides a general mechanism for data translation • Handles the common relatively simple translations automatically • Can use data translation languages for more difficult translations