140 likes | 157 Views
Contrasting typical SW and DB approaches to semantic integration. Arnon Rosenthal. Two versions of a common problem. Schema matching ≈ Align classes/properties in ontology Two meta-models, similar core problem Start with either: Two domain models Two schemas (for systems)
E N D
Contrasting typical SW and DB approaches to semantic integration Arnon Rosenthal
Two versions of a common problem • Schema matching ≈ Align classes/properties in ontology Two meta-models, similar core problem • Start with either: • Two domain models • Two schemas (for systems) • One domain model and one schema • Goal: Identify the relationships between • their concepts • their instance sets • same as, IS-A, “usable for” seem the main ones helpful to a “customer” • May need to transform to make things match
Decades have elapsed! • Database side: Survey of schema matching research (Batini et. al., 1986) • Target schema may be constructed from inputs • Envisioned end product is a SQL view • Focus is on “where can we find clues” • Sem-Web precursors: ISI, MCC – domain model (in logic) plus articulation axioms • Constraints are within the logic • Reasoning-based. Each project had its own formalism Obvious question: Why no robust products yet?
Leaping ahead to my conclusions (SW competitor) • For enterprise systems today, lean toward DB and XML tools, unless you really exploit ontologies’ greater expressive power (value taxonomies, IS-A) • Maturing sem-web environments will (by definition) import knowledge from big data integration products
Correspondence topology • Direct approaches • Neutral form approaches (can be multiple) Domain model
Emerging work – not associated with systems • Multiple intermediaries • Which to use when creating? describing? Domain model 1 Domain model 2 Domain model 3
Basic unit: atomic concept (object or property) Small chunks easy to relate & reuse Describe a domain model Robust for multiple uses Basic unit: relation or tree scheme Record is a good chunk for storage or display Sets are present, implicit Describe a system or a physical message Compare typical DB vs. AI approaches (1) Formalisms to describe concepts & relationshipsDB (Schema) AI (Ontology)
Relate via neutral defns Reuse is easier Will administrators understand “foreign” or abstract concept defns? Direct relationships and flows between systems Instant gratification (funding is usually for an applic’n, not for integrat’n) Differences in real data lead to improved definitions Tools examine the data $billion industry feature-rich, scalable tools Compare typical DB vs. AI approaches (2) Formalisms to describe concepts & relationshipsDB (Schema) AI (Ontology)
OWL has both theory and tool communities extensible Execute by inference engine? Not tuned to query processor strengths Homegrown logic Even simple Datalogs won’t interoperate extensible Mappings are in popular query languages Efficient: parallel, query optimizers Deployablee.g., change management Compare typical AI vs. DB approaches (5) DB AI
Relationships among concepts: “Usable_for” Rel’ships use formalism very similar to ontologies IS-A is “native”, i.e., part of the regular model IS-A logically merges the ontologies OWL is insufficient, rule languages overkill Relationships amongsets, via {informal or formal logic assrtns.} or query language More powerful (data exchange logicians) Terminology: TGD = שڅ View defns are big: hard to edit (and to reuse) Compare typical AI vs. DB approaches (3) DB AI
Exchange semantics: Whatever my engine infers !!! Is this tolerable? Why (not)? Exchange semantics examined from user viewpoint, precise Hard to learn or communicate Discards tuples unnecessarily? Compare typical AI vs. DB approaches (4) DB AI
How can they combine • Formalism: • OWL ontologies • Need a standard construct for “can be used for” super-property ≈ tuple-generating dependencies • Direct or via neutral model: • Mix and match, share info and infer over both • Execution environment: DBMS • Parallel, query optimization, deployment • Already bilingual (SQL, XML), add RDF when it reaches critical mass ($Bs)
Why “Alignment” research is hard to transfer • Conspicuous lack of widely-used products, from either community • Aligners/matchers automate some work of an integration engineer, but can’t 90+%solve a major “customer” problem • Without a robust mediator, there aint no customer! • Lesson: Touch the end users, downstream • (someone outside the IT dept) • 95% reduction in their work as schemas evolve • Generate code for end users 80% faster
Summary • Two communities addressing similar problems • More standards, cleaner formalisms on S Web side • More pragmatics and richer suites on the db side • Largely formalism independent, could be imported, esp. “Instant gratification”