260 likes | 272 Views
Explore the fusion of bottom-up and top-down terminology methodologies for efficient ontology modeling. Understand the challenges, benefits, and steps to link local domain models with global ontologies. ###
E N D
Terminology WG John Madden Duke University
Document to Terminology • Bottom-up • Infer classes/properties from sample corpus • Author as localvocabulary • Approximate to standard vocabularies • Lee F.: “good-enough modeling” • Top-down • Subset existing terms from standard vocabularies • Specialize terms as needed • Apply to new documents
Bottom-up (1): Pharma ontology • Collect detailed use case narratives • Lexical analysis: rank nouns and verbs by frequency • Select top 20 semes to represent class candidates • 20 x 20 grid of classes: each of 400 cells represents potential relation/property • Results bootstrap custom OWL ontology
Bottom-up (2): Cancer checklists • College of American Pathologists prescribes best practice data items on cancer diagnostic reports • Semi-formal data items → formal document description as XML Schema → XML vocabulary • XML vocabulary → RDF by definable rules (GRDDL transform) • What RDF vocabulary to target? Easiest: local vocabulary that closely matches the XML vocabulary
Bottom-up: challenges • “Anaphora” • “Tumor site: Upper outer quadrant”
Bottom-up: plus and minus • Advantages • Triples match surface syntax • Domain-specific “microvocabulary” • Document originators control semantics • Disadvantages • Vocabulary needs to “catch on” to be useful to outsiders • No “free ride” on knowledge encoded in existing ontologies (without mapping)
Top-down: Terminology today • Moderate number of large, evolved terminologies • Adapted for specific business-process contexts • Each separately, centrally curated • Typically hierarchical, various expressivities • Generally do not have a computable semantics • Uncommon to mix vocabularies • Each uses own identifier format, much existing software is “hard-coded” (or “firm-coded”) to expect/accept very specific format
Top-down: SKOS advantages • Standard representation of core concepts of traditional terminologies • Abstracts away differences in expressivity • Treats terms as subject headings • No need to assume a particular world-model
Bottom-up meets Top-down • How do you link local ontologies (domain models) with global ontologies? • Desiderata • Support upwards mappings to multiple global ontologies • No single “correct” mapping • Support ontology discovery
Prequsite for mapping: URI’s • Unique Resource Identifier • It would be nice (in some ways) if it were the unique identifier for a resource… • … but it isn’t. Pragmatically, it is an identifier for (hopefully) a unique resource.
Do I need a standard URI? Maybe. • You want to terminology-enable your graphs, but you don’t necessarily need the inferences RIGHT NOW • “URI’s are cheap!”: Add hooks for “late semantic binding” • just make a blank node (_:snomedInfluenza) • make a named node in the xml:base namespace (of the document) • explicity use a stand-in namespace you control (http://john.madden.name/mySNOMED#) • or get yourself a PURL domain (http://purl.oclc.org/mine/10/01/24/snomed#) • You can always assert owl:sameAs, or rdfs:seeAlso, or skos:closeMatch later on when you finally reason • Limitation: not all datastores have reasoners running • Of course, YOU have a reasoner running on YOUR datastore
Do I need a standard URI? Maybe. • You want to terminology-enable your graphs, and you DO need the inferences RIGHT NOW • Having a standard URI might help, IF the terminology provider serves RDF that you can use (e.g. has a SPARQL endpoint) • Otherwise, you’re going to have to represent the terminology locally (or find somebody who represents it)
Do I need a standard URI? Maybe. • You want to make your document “visible” to OTHER datastores that might be using the same standard terminology(ies) • Here, you’re a bit stuck. You would need a common URI • Roll-your own alternatives: get your consortium together and agree on an on-the-fly mapping solution • Lots of these: integrate with SPARQL solutions (see Lee & Eric); other solutions, other tricks (D2R)
URI’s Can’t we just convince everyone to use just one? • Absolutely. HL7, an ANSI and ISO standards body has already established a URI scheme for Healthcare terminology: • SNOMED CT: urn:oid:2.16.840.1.113883.6.96 • Absolutely. The UMLS has already registered identifiers in the MRSAB table, which can be combined with an appropriate base such as: • SNOMED CT: http://www.nlm.nih.gov/research/umls/sab/SCT • Absolutely. The CAP is (was) clearly the owner and should assign the official SNOMED CT URI: • SNOMED CT: http://www.cap.org/snomedct • Absolutely. IHTSDO is clearly the owner and should assign the official SNOMED CT URI • SNOMED CT: http://ihtsdo.org/snomedct
URI’s • Absolutely. Everyone should use sharedname.org (or sharedterm.org) (or bioportal.org) because it is more better… • SNOMED CT:http://sharedname.org/ontology/SCT • SNOMED CT: http://bioportal.org/ontology/SCT • Absolutely. The problem has been solved. • SNOMED CT: http://pds.portaldoors.net/snomedct …
URI’s Unique URI’s are not going to happen any time soon • Many variants are already in use • There is still no agreement of formation principles • Owner vs. dispatcher (ihtsdo vs. sharednames) • Dereferencable vs. opaque (urn:oid vs. http://...) (Arguably) the “winner” will be the resource that recognizes this…
SharedTerms Function 1: Map a URI into a “canonical” URI f(URI) URI Examples: http://sharedterm.org/uri/urn:oid:2.16.840.1.113883.6.96 303: (?) → http://ihtsdo.org/snomedct http://sharedterm.org/uri/http://www.cap.org/snomedct 303: (?) → http://ihtsdo.org/snomedct http://sharedterm.org/uri/http://ihtsdo.org/snomedct 303: (?) → http://ihtsdo.org/snomedct
SharedTerms Decision as to what is “canonical” is still an issue, but not actually that important as… http://sharedterm.org/uri/urn:oid:2.16.840.1.113883.6.96 → http://ihtsdo.org/snomedct → http://sharedterm.org/ontology/http://ihtsdo.org/snomedct → http://bioportal.org/... (record of snomedct) … what we really need is • The resource that describes SNOMED • Identity testing
The class/entity/”concept code” issue A “concept” in SNOMED-CT http://ihtsdo.org/snomedct#12345678 Needs to be mapped to a resource entry http://bioportal.org/4126/12345678 The trick is separating the code from the namespace • “#”, “/”, “:” rules work most of the time • but there are still exceptions • Embedded slashes • Hierarchical identifiers • “#” as part of the URI vs. identifier • …
Reference • http://sharedname.org/w/index.php?title=OntologyAccess