120 likes | 144 Views
Schema matching for Database Systems. Bhavik Doshi Chair: Prof. Rajendra K. Raj Reader: Dr. Carol Romanowski Department of Computer Science Rochester Institute of Technology. EMPID. Name. Salary. EMPID. Name. Salary. has. Employee. Position-type. Employee. SSN. Position. SSN.
E N D
Schema matching for Database Systems Bhavik Doshi Chair: Prof. Rajendra K. Raj Reader: Dr. Carol Romanowski Department of Computer Science Rochester Institute of Technology
EMPID Name Salary EMPID Name Salary has Employee Position-type Employee SSN Position SSN Position Schema Matching • Given: Source and Target schemas • Matching: Maps the source schema elements to target schema elements.
Why Schema Matching? • First step of Data Integration • Upcoming field in Data management research because of its important role in Enterprise Information Integration. • Building data warehouses and marts • Manual approach is very laborious
Syntactical Approach • Uses Syntax used for naming databases Elemental Level Approach ? Random Names
Data value and Constraint based approach • Uses data values, data types, comparison of ranges. Instance based approach ? Data values not appropriate
Hypothesis • Relying on a single technique for schema matching may not always succeed. • Each approach is implemented independently of the others and so the overall impact is not as effective. • Develop an integrated technique which is domain independent for oracle databases.
Objective • Develop a generic integrated matching technique. • Implement two substantially different techniques • Kang et al. • Lingmei et al. • Transform steps of the above two algorithms and make use of additional techniques for better matching. • Testing the developed technique with (~30) relational datasets and then observing the results.
Algorithms • Instance Based schema matching • Kang et al. • Element Level schema matching • Lingmei et al.
Route to Success!! Probability Distribution Syntax Semantics Structure Kang’s Approach Lingmei’s Approach Instance Based Element Based Data Type and Range Integrated Approach