250 likes | 418 Views
Technical university of Liberec Faculty of mechatronics. Semantic integration of data in database systems and ontologies. Ing. Petra Šeflová. Integration of data - merging a set given schemas into global schema Semantic integration - part of concept integration of data
E N D
Technical university of Liberec Faculty of mechatronics Semantic integration of data in database systems and ontologies Ing. Petra Šeflová
Integration of data - merging a set given schemas into global schema Semantic integration - part of concept integration of data - be focusing on data exchange between applications in the light of their meaning, content and required business rules
Find houses with four bathrooms and price under $500.000 realestate.com Source schema wrapper mediated schema Source schema wrapper homeseekers.com Source schema wrapper greathomes.com A data integration system in the real estate domain. Integration of data Example
Applications • Catalog integration in B2B applications • E-commerce • Bioinformatics • P2P Databases • Agent communications • Web services Integration
Key commonalities application of Semantic integration • Use structured representation (e.g. relational schemas and XML DTDs) • Must resolve heterogenities with respect to the schema and their data • Enable their manipulation • Merging the schemas • Computing differences • Enable translation of data and queries across the schemas/ontologies
Database schema • Present definition physical system layout (database) • Ontology • System of knowledge about world • Claimless on coherence (lot of partial ontology) • Frequently specific created artefact • Definition of Gruber: Ontology is formal, explicit specification sharing conceptualization.
Problems of Semantic integration • Semantic of elements can be inferred from only a few information sources • Creators of data • Dokumentation • Associated schema and data • Schema element are typically matched based on clues in the schema and data • Schema and data clues are often incomlpete • Matching is often subjective, depending in the application
Matching process • Take as input two schemas/ontologies, each consisting of a set discrete entities, and determine as output the relationships holding between these entities
Schema S Houses Schema T Agents Example : The schema of two relational database S and T on house listing, and the semantic correspondence between them
Matching techniques Two groups • Rule-based • Learning-based
Rule-based solutions • Many of the early as well as current matching solutions employ hand-crafted rules • Exploit schema information • Element names • Data types • Structures • Integrity constraints • Can provide a quick and concise method to capture valuable user knowledge about domain
Rule-based solutions • Benefits • „relatively inexpensive“ • Do not require training • Operate only on schema • Drawback • They cannot exploit data instance effectively • They cannot exploit previous matching efforts For example : • TranScm • DIKE • MOMIS • CUPID
TranScm • Employs rules such as „two elements match if they have the same name (allowing synonyms) and the same number of subelements • DIKE • Computes similarity between two schema element based on similarity of the characteristics of the element and similarity of related elements • MOMIS • Compute similarity of schema elements as a weighted suma of the similarity of name,data type and substructure • CUPID • Employs rules that categorize elements based on names, data types and domains
Learning-based solutions • Exploit both schema and data information • They do exploit previous matching efforts • Examples: • SemInt system • LSD system • iMAP system • Autocomplex • Automatch
SemInt • Uses a neuralnetwork learning approaches • It matched schema elements based on attribute specifications and statistic of data content • LSD • Employs Naive Bayes over data instance • Develop novel learning solution exploit the hierarchical nature of XML data • iMAP • Matches the schemas of two sources by analyzing the description of objects that are found in both sources • Autoplex and Automatch • Use a Naive Bayes learning approach that exploits data instances to match element
The Matching dimensions • Input dimension • Process dimension • Output dimensions
Input dimension • Concern the kind of input on which algorithm operate • First dimension • Algorithms depending on the data/ conceptual model in which ontologies or schemas are expressed • Second dimension • Depend on the kind of data algorithms exploit • Different approaches exploit different information of the input data/conceptual models • Schema-level information • Instance data • Exploit both
Process dimensions • Classification of the matching process could be based on its general properties • It depends on the approximate or exact nature of its computation • Exact algorithms compute the absolute solution to a problem • Approximate algorithms sacrifice exactness to performance • Three large classes based on intrinsic input, external resources or some semantic theory • Syntactic • External • Semantic
Output dimensions • Concern the form of the result they produce • One-to-one correspondence • Is any relation suitable • Has it to be final mapping element • System deliver a graded answer • Correspondences hold with 98% confidence • Correspondences hold with 4/5 probability • All-or-nothing answer • Correspondences using distance measuring • Kind of relations between entities a system can provide • Equivalence • Subsumption • Incompatibility
Schema-Based Matching Techniques Element-level Structure-level Syntactic External Syntantic External Semantic Granuality/Input Interpretation layer String- Based Language- Based Linguistic Resource Contraint- Based Alignment reuse Upper Level Formal ontologies Graph- Based Taxonomy- Based Repository of Structure Model- Based Linguistic Internal Relational Basic Techniques layer Semantic Terminological Structural Schema-Based Matching Techniques Classification of elementary schema-based matching approaches
Element-level vs structure-level • Element-level matching techniques compute mapping elements by analyzing entities in isolation • Ignoring their relation with other entities • Structure-level techniques compute mapping elements by analyzing how entities appear together in a structure
Internal vs external techniques • Interal • Exploiting information which comes only with input schema/ontologies • Syntactic interpretation of input • Sematic interpretation of input • External • Exploit auxiliary (external) resources of domain to interpret the input • Resources : • Human input • Some thesaurus expressing the relationship between terms
Schema Matching vs Ontology Matching Differences • Database schema often do not provide explicit semantics for their data • Semantics is usually specified explicitly at design-time • Usually performed with the help of techniques trying to guess the meaning encoded in the schemas • Ontologies are logical systems that themselves obey some formal semantics • Primarily try to exploit knowledge explicitly encoded in the ontologies
Schema Matchin vs Ontology Matching Commonalities • Ontologies and schemas are similar in the sense : • Provide a vocablurary of terms that describes a domain of interest • Constrain the meaning of terms used in vocablurary • Schema and ontologies are found in such enviroment as the Semantic web
Sources : • Natalya F.Noy : Semantic Integration: A survey of Ontology-Based Approaches • AnHai Doan, Alon Y. Haley: Semantic Integration in the Database Community: A Brief Survey • P.Schvaiko, J. Euzenat: A Survey of schema-based Matching Approaches • G. Antonious, F. van Harmelen: A Semantic Web Primer • R. Araújo, H. Sofia Pinto: Toward Semantics-based ontology similarity • H. Wache, T. Vögele, U. Visser, H. Stuckenschmidt, G. Shuster, H. Neumann and S. Húbner: Ontology-based integration of information – A survey existing Approaches • E. Rahm, P.A. Bernstein: A survey of approaches to automatic schema matching