Ontology mapping needs context & approximation

Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam

Or: • How to make ontology-mapping less like data-base integration • andmore like a social conversation

Three Two obvious intuitions • Ontology mapping needs background knowledge • The Semantic Web needs ontology mapping • Ontology mapping needs approximation

Which Semantic Web? • Version 1:"Semantic Web as Web of Data" (TBL) • recipe:expose databases on the web, use RDF, integrate • meta-data from: • expressing DB schema semantics in machine interpretable ways • enable integration and unexpected re-use

Which Semantic Web? • Version 2:“Enrichment of the current Web” • recipe:Annotate, classify, index • meta-data from: • automatically producing markup: named-entity recognition, concept extraction, tagging, etc. • enable personalisation, search, browse,..

Which Semantic Web? • Version 1:“Semantic Web as Web of Data” • Version 2:“Enrichment of the current Web” data-oriented • Different use-cases • Different techniques • Different users user-oriented

Which Semantic Web? • Version 1:“Semantic Web as Web of Data” • Version 2:“Enrichment of the current Web” • But both need ontologies for semantic agreement between sources between source & user

 Ontology research is almost done.. • we know what they are“consensual, formalised models of a domain” • we know how to make and maintain them (methods, tools, experience) • we know how to deploy them(search, personalisation, data-integration, …) Main remaining open questions • Automatic construction (learning) • Automatic mapping (integration)

Three obvious intuitions • The Semantic Web needs ontology mapping • Ontology mapping needs background knowledge Ph.D. student AIO ?= • Ontology mapping needs approximation ?≈ young researcher post-doc

This work with Zharko Aleksovski &Michel Klein

Does context knowledge help mapping?

anchoring anchoring The general idea background knowledge inference source target mapping

a realistic example • Two Amsterdam hospitals (OLVG, AMC) • Two Intensive Care Units, different vocab’s • Want to compare quality of care • OLVG-1400: • 1400 terms in a flat list • used in the first 24 hour of stay • some implicit hierarchy e.g.6 types of Diabetes Mellitus) • some reduncy (spelling mistakes) • AMC: similar list, but from different hospital

Context ontology used • DICE: • 2500 concepts (5000 terms), 4500 links • Formalised in DL • five main categories: • tractus (e.g. nervous_system, respiratory_system) • aetiology (e.g. virus, poising) • abnormality (e.g. fracture, tumor) • action (e.g. biopsy, observation, removal) • anatomic_location (e.g. lungs, skin)

Baseline: Linguistic methods • Combine lexical analysis with hierarchical structure • 313 suggested matches, around 70 % correct • 209 suggested matches, around 90 % correct • High precision, low recall (“the easy cases”)

anchoring anchoring Now use background knowledge DICE (2500 concepts, 4500 links) inference OLVG (1400, flat) AMC (1400, flat) mapping

Example found with context knowledge (beyond lexical)

Example 2

Anchoring strength • Anchoring = substring + trivial morphology

Experimental results • Source & target = flat lists of ±1400 ICU terms each • Background = DICE (2300 concepts in DL) • Manual Gold Standard (n=200)

Does more context knowledge help?

Adding more context • Only lexical • DICE (2500 concepts) • MeSH (22000 concepts) • ICD-10 (11000 concepts) • Anchoring strength:

Results with multiple ontologies • Monotonic improvement • Independent of order • Linear increase of cost Joint

does structured context knowledge help?

FMA (75.000) anchoring anchoring inference CRISP (738) MeSH (1475) mapping Exploiting structure • CRISP: 700 concepts, broader-than • MeSH: 1475 concepts, broader-than • FMA: 75.000 concepts, 160 relation-types(we used: is-a & part-of)

a a i Using the structure or not ? • (S <a B) & (B < B’) & (B’ <a T) ! (S <i T)

Using the structure or not ? • (S <a B) & (B < B’) & (B’ <a T) ! (S <i T) • No use of structure • Only stated is-a & part-of • Transitive chains of is-a, andtransitive chains of part-of • Transitive chains of is-a and part-of • One chain of part-of before one chain of is-a

Examples

Matching results (CRISP to MeSH) (Golden Standard n=30)

Three obvious intuitions • The Semantic Web needs ontology mapping • Ontology mapping needs background knowledge • Ontology mapping needs approximation ?≈ young researcher post-doc

This work with Zharko Aleksovski Risto Gligorov Warner ten Kate

B2 B Approximating subsumptions(and hence mappings) • query: A v B ? • B = B1u B2u B3 Av B1, Av B2, Av B3 ? B1 B3 A

Approximating subsumptions • Use “Google distance” to decide which subproblems are reasonable to focus on • Google distance where f(x) is the number of Google hits for x f(x,y) is the number of Google hits for the tuple of search items x and y M is the number of web pages indexed by Google • symmetric conditional probability of co-occurrence • estimate of semantic distance • estimate of “contribution” to B1 u B2 u B3

Google distance animal plant sheep cow vegeterian madcow

Google for sloppy matching • Algorithm for A v B (B=B1u B2u B3) • determine NGD(B, Bi)=i, i=1,2,3 • incrementally: • increase sloppyness threshold  • allow to ignore A v Bi with i · • match if remaining A v Bjhold

Properties of sloppy matching • When sloppyness threshold goes up,set of matches grows monotonically • =0: classical matching • =1: trivial matching • Ideally: compute i such that: • desirable matches become true at low  • undesirable matches become true only at high  • Use random selection of Bi as baseline ?

ArtistGigs CDNow (Amazon.com) All Music Guide MusicMoz Yahoo Size: 403 classes Size: 1073 classes Size: 222 classes Size: 2410 classes Size: 382 classes Size: 96 classes Depth: 2 levels Depth: 3 levels Depth: 7 levels Depth: 5 levels Depth: 4 levels Depth: 2 levels CD baby Artist Direct Network Experiments in music domain Size: 465 classes very sloppy terms  good Depth: 2 levels

Experiment Manual Gold Standard, N=50, random pairs  =0.53 97  =0.5 60 classical precision random NGD recall 20 16-05-2006 7

wrapping up

Three obvious intuitions • Ontology mapping needs background knowledge • The Semantic Web needs ontology mapping • Ontology mapping needs approximation

So that • shared context & approximationmake ontology-mapping a bit more like a social conversation

anchoring anchoring Future: Distributed/P2P setting background knowledge inference source target mapping

Frank.van.Harmelen@cs.vu.nl http://www.cs.vu.nl/~frankh Vragen & discussie

Ontology mapping needs context & approximation