440 likes | 595 Views
Ontology mapping needs context & approximation. Frank van Harmelen Vrije Universiteit Amsterdam. Or: . How to make ontology-mapping less like data-base integration. and more like a social conversation. Three. Two obvious intuitions. Ontology mapping needs background knowledge.
E N D
Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam
Or: • How to make ontology-mapping less like data-base integration • andmore like a social conversation
Three Two obvious intuitions • Ontology mapping needs background knowledge • The Semantic Web needs ontology mapping • Ontology mapping needs approximation
Which Semantic Web? • Version 1:"Semantic Web as Web of Data" (TBL) • recipe:expose databases on the web, use RDF, integrate • meta-data from: • expressing DB schema semantics in machine interpretable ways • enable integration and unexpected re-use
Which Semantic Web? • Version 2:“Enrichment of the current Web” • recipe:Annotate, classify, index • meta-data from: • automatically producing markup: named-entity recognition, concept extraction, tagging, etc. • enable personalisation, search, browse,..
Which Semantic Web? • Version 1:“Semantic Web as Web of Data” • Version 2:“Enrichment of the current Web” data-oriented • Different use-cases • Different techniques • Different users user-oriented
Which Semantic Web? • Version 1:“Semantic Web as Web of Data” • Version 2:“Enrichment of the current Web” • But both need ontologies for semantic agreement between sources between source & user
Ontology research is almost done.. • we know what they are“consensual, formalised models of a domain” • we know how to make and maintain them (methods, tools, experience) • we know how to deploy them(search, personalisation, data-integration, …) Main remaining open questions • Automatic construction (learning) • Automatic mapping (integration)
Three obvious intuitions • The Semantic Web needs ontology mapping • Ontology mapping needs background knowledge Ph.D. student AIO ?= • Ontology mapping needs approximation ?≈ young researcher post-doc
This work with Zharko Aleksovski &Michel Klein
anchoring anchoring The general idea background knowledge inference source target mapping
a realistic example • Two Amsterdam hospitals (OLVG, AMC) • Two Intensive Care Units, different vocab’s • Want to compare quality of care • OLVG-1400: • 1400 terms in a flat list • used in the first 24 hour of stay • some implicit hierarchy e.g.6 types of Diabetes Mellitus) • some reduncy (spelling mistakes) • AMC: similar list, but from different hospital
Context ontology used • DICE: • 2500 concepts (5000 terms), 4500 links • Formalised in DL • five main categories: • tractus (e.g. nervous_system, respiratory_system) • aetiology (e.g. virus, poising) • abnormality (e.g. fracture, tumor) • action (e.g. biopsy, observation, removal) • anatomic_location (e.g. lungs, skin)
Baseline: Linguistic methods • Combine lexical analysis with hierarchical structure • 313 suggested matches, around 70 % correct • 209 suggested matches, around 90 % correct • High precision, low recall (“the easy cases”)
anchoring anchoring Now use background knowledge DICE (2500 concepts, 4500 links) inference OLVG (1400, flat) AMC (1400, flat) mapping
Anchoring strength • Anchoring = substring + trivial morphology
Experimental results • Source & target = flat lists of ±1400 ICU terms each • Background = DICE (2300 concepts in DL) • Manual Gold Standard (n=200)
Adding more context • Only lexical • DICE (2500 concepts) • MeSH (22000 concepts) • ICD-10 (11000 concepts) • Anchoring strength:
Results with multiple ontologies • Monotonic improvement • Independent of order • Linear increase of cost Joint
FMA (75.000) anchoring anchoring inference CRISP (738) MeSH (1475) mapping Exploiting structure • CRISP: 700 concepts, broader-than • MeSH: 1475 concepts, broader-than • FMA: 75.000 concepts, 160 relation-types(we used: is-a & part-of)
a a i Using the structure or not ? • (S <a B) & (B < B’) & (B’ <a T) ! (S <i T)
Using the structure or not ? • (S <a B) & (B < B’) & (B’ <a T) ! (S <i T) • No use of structure • Only stated is-a & part-of • Transitive chains of is-a, andtransitive chains of part-of • Transitive chains of is-a and part-of • One chain of part-of before one chain of is-a
Matching results (CRISP to MeSH) (Golden Standard n=30)
Three obvious intuitions • The Semantic Web needs ontology mapping • Ontology mapping needs background knowledge • Ontology mapping needs approximation ?≈ young researcher post-doc
This work with Zharko Aleksovski Risto Gligorov Warner ten Kate
B2 B Approximating subsumptions(and hence mappings) • query: A v B ? • B = B1u B2u B3 Av B1, Av B2, Av B3 ? B1 B3 A
Approximating subsumptions • Use “Google distance” to decide which subproblems are reasonable to focus on • Google distance where f(x) is the number of Google hits for x f(x,y) is the number of Google hits for the tuple of search items x and y M is the number of web pages indexed by Google • symmetric conditional probability of co-occurrence • estimate of semantic distance • estimate of “contribution” to B1 u B2 u B3
Google distance animal plant sheep cow vegeterian madcow
Google for sloppy matching • Algorithm for A v B (B=B1u B2u B3) • determine NGD(B, Bi)=i, i=1,2,3 • incrementally: • increase sloppyness threshold • allow to ignore A v Bi with i · • match if remaining A v Bjhold
Properties of sloppy matching • When sloppyness threshold goes up,set of matches grows monotonically • =0: classical matching • =1: trivial matching • Ideally: compute i such that: • desirable matches become true at low • undesirable matches become true only at high • Use random selection of Bi as baseline ?
ArtistGigs CDNow (Amazon.com) All Music Guide MusicMoz Yahoo Size: 403 classes Size: 1073 classes Size: 222 classes Size: 2410 classes Size: 382 classes Size: 96 classes Depth: 2 levels Depth: 3 levels Depth: 7 levels Depth: 5 levels Depth: 4 levels Depth: 2 levels CD baby Artist Direct Network Experiments in music domain Size: 465 classes very sloppy terms good Depth: 2 levels
Experiment Manual Gold Standard, N=50, random pairs =0.53 97 =0.5 60 classical precision random NGD recall 20 16-05-2006 7
Three obvious intuitions • Ontology mapping needs background knowledge • The Semantic Web needs ontology mapping • Ontology mapping needs approximation
So that • shared context & approximationmake ontology-mapping a bit more like a social conversation
anchoring anchoring Future: Distributed/P2P setting background knowledge inference source target mapping
Frank.van.Harmelen@cs.vu.nl http://www.cs.vu.nl/~frankh Vragen & discussie