Semantic Integration in Heterogeneous Databases Using Neural Networks

Semantic Integration in Heterogeneous Databases Using Neural Networks Wen-Syan Li, Chris Clifton Presentation by Jeff Roth

Introduction • Basic schema matching problem • GTE’s data integration project included 27,000 data elements • This took 4 hours per data element or 25 full time employees 2 years to complete • This method -> .1 seconds, 144000 x faster • “how to match knowledge is discovered”

Method Outline “The end user is able to distinguish between unreasonable and reasonable answers, and exact results aren’t critical. This method allows a user to obtain reasonable answers requiring database integration at a low cost”

Automated semantic integration methods • Attribute Name Comparison This method is not used in this paper • Attribute values and domains comparison Equal, Contains, Overlap, Contained-in and Disjoint Used but not with the above measures • Field Specifications Data type, field length constraints and others. This is also used in this method

Field Specifications The following measures are used • data types Each possible data type has a network input, with the field data type having a value of 1 and all the other having a value of 0 • field length Length = 2 * (1/(1 + k-length) - 0.5) • format specifications similar to data type • constraints (primary key, foreign key, disallowing nulls, access restrictions, etc…) similar to data type

Attribute Values and Domains Divide measures into character fields and numeric fields • Patterns for Character fields 1. Ratio of numerical characters Address: 146 South 920 West would score 6/18 2. Ratio of white space Address: 146 South 920 West would score 3/18 3. Length Statistics Average, Variance, and coefficient of the “used” length relative to the maximum length

Attribute Values and Domains cont. • Patterns for numeric fields 1. Average (mean) 2. Variance 3. Coefficient of variation Recognizes similarity between values of different Units and Granularity This can also help recognize which fields may need unit conversions 4. Grouping For example: area code, zip code, first three digits of SSN

Self-Organizing Grouping algorithm • N = number of possible discriminators • M = number of categories, this can be adjusted by user. “ideally this is |attributes| - |foreign keys|” • This is unsupervised, i.e. you don’t have to provide a correct classification, it simply groups based on similarity

Training the Back-Prop Network • Inputs (N) are identical to classifier • Outputs (M) are trained using Back-Propagation and classifier’s results • Categories are labeled with the attributes they grouped together*

Integration Procedure 1 2 3 1. DBMS Specific Parser 2. Classify (Categorize) Training Data 3. Train Neural Network 4. DBMS Specific Parser 5. Classification by Neural Network 6. User Checks Results 6 4 5

Results

Conclusion and Future Work • Human Effort needed for semantic integration is minimized • Different Systems have different attribute properties available - automated solution • Extend to automated information integration • C source code available at eecs.nwu.edu/pub/semint

Semantic Integration in Heterogeneous Databases Using Neural Networks

Semantic Integration in Heterogeneous Databases Using Neural Networks

Presentation Transcript

Neural Integration

Image Compression Using Neural Networks

Semantic Integration of Heterogeneous NASA Mission Data Sources

Classification Using Neural Networks

Using Matlab Neural Networks Toolbox

FINANCIAL FORECASTING USING NEURAL NETWORKS

Communities in Heterogeneous Networks

Heterogeneous networks

Systems Integration Secrets Using Logical Databases

Clustering using Spiking Neural Networks

Searching by shape in heterogeneous databases

Semantic Models in Neural Networks

Biological Databases, Integration, and Semantic Web

Learning from relational databases using recurrent neural networks

Heterogeneous integration

Character Recognition Using Neural Networks

A Metadata Integration Assistant Generator for Heterogeneous Databases

USING DATABASES AND TECHNOLOGY IN CURRICULUM INTEGRATION

Heterogeneous convolutional neural networks for visual recognition

Robust Neural Networks using Motes

Systems Integration Secrets Using Logical Databases

Neural Networks in Social Networks