330 likes | 374 Views
Structural and Semantic Heterogeneity in Database Schema Integration. SIXTH Conference of Department of Computing Wednesday 4 May 2005 David George. Presentation Content. Why is Integration necessary? Evolution of Integration Approaches
E N D
Structural and Semantic Heterogeneityin Database Schema Integration SIXTH Conference of Department of Computing Wednesday 4 May 2005 David George
Presentation Content • Why is Integration necessary? • Evolution of Integration Approaches • Barriers to Integration - Structural and Semantic Heterogeneity • New opportunity - Ontology and the Semantic Web
Drivers for Data Integration • Global organisations with distributed data. • Organisations having legacy and new databases. • Organisational change e.g. business re-engineering and acquisitions. • Autonomous departments with disconnected systems requiring interoperability e.g. Financial Services. • Business Intelligence requiring: • decision-support systems. • customer analysis and marketing strategies. • data mining
Schema Integration Local DB schema Global schema integration Queries Global Schema Schema 1 Queryinput Global:Local Schema Mapping Schema 2 Queryoutput Schema n
Evolution in Integrations Global Domain Agreements Knowledge Digital media Visual/Spatial/Temporal Data [Kiosk/Geographic/Flights/Forecasting] Focus – Semantics Domain-specific Information Structured, Semi-structured Text repositories Focus - Syntax of data type, format & Schema constructs Data Structured DBs, Files Focus – Systems & Communications System Schema Integration Common Data Models Virtual Integration Single Ontologies Multiple ontologies, Inter-ontological Local Task Schemas Federated DBS Federated IS (inc Mediators) Information Brokering 1985 1995
Federated DBMS approach External Schema 1.1 External Schema 1.2 External Schema 2.1 Common Data Model Federated Schema 1 Federated Schema 2 Export Schema 1.1 Export Schema 2.1 Export Schema 2.2 etc Component Schema 1 Component Schema 2 Local Schema 1 Local Schema 2 Application: Integration of business databases Component DBS 1 Component DBS 2
FDBMS schema architecture External Schema 1.1 External Schema 1.2 External Schema 2.1 External Schema 2.2 Common Data Model Federated Schema 1 Federated Schema 2 Export Schema 1.1 Export Schema 2.1 Export Schema 2.2 Export Schema 3.1 Component Schema 1 Component Schema 2 Component Schema 3 Local Schema 1 Local Schema 2 Local Schema 3 Component DBS 1 Component DBS 2 Component DBS 3
Mediator/Wrapper (Virtual integration) Network Internet Query Translation Mediator Application: Integrated access to Heterogeneous data
Multiple Worlds Information Mediation Crime Statistics Demographics Property Sales School Rankings Information Brokering Search Query: “Find detached houses for sale under £300k with 2 bathrooms, 3 bedrooms, a local school rated in the upper quartile of govt. league tables, in a district with below-average crime rate and a socio-economically diverse population?”
Barriers to Integration - Structural & Semantic Heterogeneity
Recipe for Heterogeneity and Conflict Conceptualisations of the real world are influenced by the designers view of the Concept and Context to be modelled
Schema Type Conflicts Name Address Title Name Book Topics Publisher Book-Topic Pub-Book Title Title Publication Keyword Code Code Pub-Keyword Research Area Publisher
Incoherence in Cardinality Invoice Invoice Invoice 1 n 1 Inv:Order Inv:Order Inv:Order m 1 m Order Order Order
Generalisation/Specialisation Conflicts Schema 1 Schema 2 Student Student S_Type (ID#,Name,Type,Course) U-graduate Graduate (ID#,Name,Course) (ID#,Name,Course) i.e. U-graduate in schema 2 represented at more general level in schema 1
Specialisation Classification Conflicts Employee Criteria inconsistency Employee Gender Role Sex Service Person Person Degrees inconsistency <30 30-60 >60 <25 25-55 >55 Customer Customer Characterisation inconsistency Child Adult Senior Teen Parent G-Parent
Aggregation Conflicts Aggregation used in schema 1 is represented by a set-of entities in schema 2 Also NB: mapping exists in only one direction Convoy Schema 1 (ID#, Av_Weight, Location) Ship (ID#, Weight, Location, Captain) Schema 2
Aggregation Conflicts (contd) Employee(department) vs. Employee(division(department)) Component class of collection CarType(carMake, carDesign) vs. FamilyType(carMake, saloonSize) Aggregation Specialisation Person(address, tel) vs. Person(street, city, county, tel) Aggregation Composition
Schematic Discrepancies Data:Attribute:Entity conflicts Stock DB1 Value (Date, StockCode, ClosePrice) (stockItem) Stock DB2 (Date, StockItem1, StockItem2, …StockItemn) (ClosePrice) Attribute StockItem1DB3 StockItemnDB3 (Date, ClosePrice) (Date, ClosePrice) ..….. Entity Entity
Ontologies in Computing • Formal vocabulary of a “universe of discourse”. • Ontologies define: • concepts and their attributes • relationships between concepts • constraints on those relationships “An Ontology is a formal, explicit specification of a shared conceptualization” (Gruber, 1993 & Borst, 1997)
Biblio-Thing Document Book Miscellaneous-Publication Proceedings Edited-Book Technical-Manual Cartographic-Map Computer-Program Doctoral-Thesis Newspaper Journal Master-Thesis Magazine Bibliographic Data Ontology (extract) Agent Person Organization Author Publisher University Thesis Periodical-Publication http://www.ksl.stanford.edu/knowledge-sharing/ontologies/html/
Types of “ontologies” • DE BRUIJN, J. (2003) Using Ontologies - Enabling Knowledge Sharing and Reuse on the Semantic Web [online]. Innsbruck, Austria, DERI – Digital Enterprise Research Institute. Available from: http://www.deri.ie/publications/techpapers/documents/DERI-TR-2003-10-29.pdf. [Accessed 15 February 2005]. • Value restrictions: values of properties are restricted (e.g. by a datatype). • General logic constraints: values may be constrained by using values from other properties. • First-order logic constraints: very expressive constraints between relationships such as: • disjoint classes, inverse relationships, part-whole relationships.
Semantic Web DE BRUIJN, J. (2003) Using Ontologies - Enabling Knowledge Sharing and Reuse on the Semantic Web [online]. Innsbruck, Austria, DERI – Digital Enterprise Research Institute. Available from: http://www.deri.ie/publications/techpapers/documents/DERI-TR-2003-10-29.pdf. [Accessed 15 February 2005].
Semantic Web Tower OWL: Clients in S1same-As Customers in S2 RDFS: person X is a LivingPerson RDF: person X is named “Bill". OWL Ontology Language “The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” Tim Berners-Lee et al., 2001
RDF Example Object, Attribute, Value Triple: Predicate, subject, object
Evolution in Interoperability Knowledge Global Domain Understanding comprehensive metadata and ontology approaches Digital media Visual/Spacio-Temporal Modelling Scientific/Engineering I n t e r o p e r a b i l i t y Information Key focus on: Semantics & more domain-specific Multi-modal sys Structured, Semi-structured (HTML etc) Text repositories Understanding use of metadata & schematic heterogeneities Data Key focus on: Syntax – data types/format Structure – schema constructs O-O sys Structured DBs, Files System E-R sys Key focus: Systems & Comms. Common Data Models Schema translation & Integration Schematic & metadata relationships, Wrappers, Single Ontologies Multiple ontologies, Inter-ontological, Metadata standards Local Schema MDBMS / Federated DBS Fed. Inf. Systems / Mediators Mediator / Information Brokering 1985 1995 Architectures