440 likes | 449 Views
This research explores the need for ontology mapping in the Semantic Web and the importance of background knowledge and approximation techniques. The study investigates the impact of context knowledge on mapping and presents experimental results using multiple ontologies.
E N D
Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam
Or: • How to make ontology-mapping less like data-base integration • andmore like a social conversation
Three Two obvious intuitions • Ontology mapping needs background knowledge • The Semantic Web needs ontology mapping • Ontology mapping needs approximation
Which Semantic Web? • Version 1:"Semantic Web as Web of Data" (TBL) • recipe:expose databases on the web, use RDF, integrate • meta-data from: • expressing DB schema semantics in machine interpretable ways • enable integration and unexpected re-use
Which Semantic Web? • Version 2:“Enrichment of the current Web” • recipe:Annotate, classify, index • meta-data from: • automatically producing markup: named-entity recognition, concept extraction, tagging, etc. • enable personalisation, search, browse,..
Which Semantic Web? • Version 1:“Semantic Web as Web of Data” • Version 2:“Enrichment of the current Web” data-oriented • Different use-cases • Different techniques • Different users user-oriented
Which Semantic Web? • Version 1:“Semantic Web as Web of Data” • Version 2:“Enrichment of the current Web” • But both need ontologies for semantic agreement between sources between source & user
Ontology research is almost done.. • we know what they are“consensual, formalised models of a domain” • we know how to make and maintain them (methods, tools, experience) • we know how to deploy them(search, personalisation, data-integration, …) Main remaining open questions • Automatic construction (learning) • Automatic mapping (integration)
Three obvious intuitions • The Semantic Web needs ontology mapping • Ontology mapping needs background knowledge Ph.D. student AIO ?= • Ontology mapping needs approximation ?≈ young researcher post-doc
This work with Zharko Aleksovski &Michel Klein
anchoring anchoring The general idea background knowledge inference source target mapping
a realistic example • Two Amsterdam hospitals (OLVG, AMC) • Two Intensive Care Units, different vocab’s • Want to compare quality of care • OLVG-1400: • 1400 terms in a flat list • used in the first 24 hour of stay • some implicit hierarchy e.g.6 types of Diabetes Mellitus) • some reduncy (spelling mistakes) • AMC: similar list, but from different hospital
Context ontology used • DICE: • 2500 concepts (5000 terms), 4500 links • Formalised in DL • five main categories: • tractus (e.g. nervous_system, respiratory_system) • aetiology (e.g. virus, poising) • abnormality (e.g. fracture, tumor) • action (e.g. biopsy, observation, removal) • anatomic_location (e.g. lungs, skin)
Baseline: Linguistic methods • Combine lexical analysis with hierarchical structure • 313 suggested matches, around 70 % correct • 209 suggested matches, around 90 % correct • High precision, low recall (“the easy cases”)
anchoring anchoring Now use background knowledge DICE (2500 concepts, 4500 links) inference OLVG (1400, flat) AMC (1400, flat) mapping
Anchoring strength • Anchoring = substring + trivial morphology
Experimental results • Source & target = flat lists of ±1400 ICU terms each • Background = DICE (2300 concepts in DL) • Manual Gold Standard (n=200)
Adding more context • Only lexical • DICE (2500 concepts) • MeSH (22000 concepts) • ICD-10 (11000 concepts) • Anchoring strength:
Results with multiple ontologies • Monotonic improvement • Independent of order • Linear increase of cost Joint
FMA (75.000) anchoring anchoring inference CRISP (738) MeSH (1475) mapping Exploiting structure • CRISP: 700 concepts, broader-than • MeSH: 1475 concepts, broader-than • FMA: 75.000 concepts, 160 relation-types(we used: is-a & part-of)
a a i Using the structure or not ? • (S <a B) & (B < B’) & (B’ <a T) ! (S <i T)
Using the structure or not ? • (S <a B) & (B < B’) & (B’ <a T) ! (S <i T) • No use of structure • Only stated is-a & part-of • Transitive chains of is-a, andtransitive chains of part-of • Transitive chains of is-a and part-of • One chain of part-of before one chain of is-a
Matching results (CRISP to MeSH) (Golden Standard n=30)
Three obvious intuitions • The Semantic Web needs ontology mapping • Ontology mapping needs background knowledge • Ontology mapping needs approximation ?≈ young researcher post-doc
This work with Zharko Aleksovski Risto Gligorov Warner ten Kate
B2 B Approximating subsumptions(and hence mappings) • query: A v B ? • B = B1u B2u B3 Av B1, Av B2, Av B3 ? B1 B3 A
Approximating subsumptions • Use “Google distance” to decide which subproblems are reasonable to focus on • Google distance where f(x) is the number of Google hits for x f(x,y) is the number of Google hits for the tuple of search items x and y M is the number of web pages indexed by Google • symmetric conditional probability of co-occurrence • estimate of semantic distance • estimate of “contribution” to B1 u B2 u B3
Google distance animal plant sheep cow vegeterian madcow
Google for sloppy matching • Algorithm for A v B (B=B1u B2u B3) • determine NGD(B, Bi)=i, i=1,2,3 • incrementally: • increase sloppyness threshold • allow to ignore A v Bi with i · • match if remaining A v Bjhold
Properties of sloppy matching • When sloppyness threshold goes up,set of matches grows monotonically • =0: classical matching • =1: trivial matching • Ideally: compute i such that: • desirable matches become true at low • undesirable matches become true only at high • Use random selection of Bi as baseline ?
ArtistGigs CDNow (Amazon.com) All Music Guide MusicMoz Yahoo Size: 403 classes Size: 1073 classes Size: 222 classes Size: 2410 classes Size: 382 classes Size: 96 classes Depth: 2 levels Depth: 3 levels Depth: 7 levels Depth: 5 levels Depth: 4 levels Depth: 2 levels CD baby Artist Direct Network Experiments in music domain Size: 465 classes very sloppy terms good Depth: 2 levels
Experiment Manual Gold Standard, N=50, random pairs =0.53 97 =0.5 60 classical precision random NGD recall 20 16-05-2006 7
Three obvious intuitions • Ontology mapping needs background knowledge • The Semantic Web needs ontology mapping • Ontology mapping needs approximation
So that • shared context & approximationmake ontology-mapping a bit more like a social conversation
anchoring anchoring Future: Distributed/P2P setting background knowledge inference source target mapping
Frank.van.Harmelen@cs.vu.nl http://www.cs.vu.nl/~frankh Vragen & discussie