1 / 26

Terminology WG

Explore the fusion of bottom-up and top-down terminology methodologies for efficient ontology modeling. Understand the challenges, benefits, and steps to link local domain models with global ontologies. ###

vkell
Download Presentation

Terminology WG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Terminology WG John Madden Duke University

  2. Document to Terminology • Bottom-up • Infer classes/properties from sample corpus • Author as localvocabulary • Approximate to standard vocabularies • Lee F.: “good-enough modeling” • Top-down • Subset existing terms from standard vocabularies • Specialize terms as needed • Apply to new documents

  3. Bottom-up (1): Pharma ontology • Collect detailed use case narratives • Lexical analysis: rank nouns and verbs by frequency • Select top 20 semes to represent class candidates • 20 x 20 grid of classes: each of 400 cells represents potential relation/property • Results bootstrap custom OWL ontology

  4. Bottom-up (2): Cancer checklists • College of American Pathologists prescribes best practice data items on cancer diagnostic reports • Semi-formal data items → formal document description as XML Schema → XML vocabulary • XML vocabulary → RDF by definable rules (GRDDL transform) • What RDF vocabulary to target? Easiest: local vocabulary that closely matches the XML vocabulary

  5. Bottom-up: challenges • “Anaphora” • “Tumor site: Upper outer quadrant”

  6. Bottom-up: plus and minus • Advantages • Triples match surface syntax • Domain-specific “microvocabulary” • Document originators control semantics • Disadvantages • Vocabulary needs to “catch on” to be useful to outsiders • No “free ride” on knowledge encoded in existing ontologies (without mapping)

  7. Top-down: Terminology today • Moderate number of large, evolved terminologies • Adapted for specific business-process contexts • Each separately, centrally curated • Typically hierarchical, various expressivities • Generally do not have a computable semantics • Uncommon to mix vocabularies • Each uses own identifier format, much existing software is “hard-coded” (or “firm-coded”) to expect/accept very specific format

  8. Top-down: SKOS advantages • Standard representation of core concepts of traditional terminologies • Abstracts away differences in expressivity • Treats terms as subject headings • No need to assume a particular world-model

  9. Bottom-up meets Top-down • How do you link local ontologies (domain models) with global ontologies? • Desiderata • Support upwards mappings to multiple global ontologies • No single “correct” mapping • Support ontology discovery

  10. Prequsite for mapping: URI’s • Unique Resource Identifier • It would be nice (in some ways) if it were the unique identifier for a resource… • … but it isn’t. Pragmatically, it is an identifier for (hopefully) a unique resource.

  11. Do I need a standard URI? Maybe. • You want to terminology-enable your graphs, but you don’t necessarily need the inferences RIGHT NOW • “URI’s are cheap!”: Add hooks for “late semantic binding” • just make a blank node (_:snomedInfluenza) • make a named node in the xml:base namespace (of the document) • explicity use a stand-in namespace you control (http://john.madden.name/mySNOMED#) • or get yourself a PURL domain (http://purl.oclc.org/mine/10/01/24/snomed#) • You can always assert owl:sameAs, or rdfs:seeAlso, or skos:closeMatch later on when you finally reason • Limitation: not all datastores have reasoners running • Of course, YOU have a reasoner running on YOUR datastore

  12. Do I need a standard URI? Maybe. • You want to terminology-enable your graphs, and you DO need the inferences RIGHT NOW • Having a standard URI might help, IF the terminology provider serves RDF that you can use (e.g. has a SPARQL endpoint) • Otherwise, you’re going to have to represent the terminology locally (or find somebody who represents it)

  13. Do I need a standard URI? Maybe. • You want to make your document “visible” to OTHER datastores that might be using the same standard terminology(ies) • Here, you’re a bit stuck. You would need a common URI • Roll-your own alternatives: get your consortium together and agree on an on-the-fly mapping solution • Lots of these: integrate with SPARQL solutions (see Lee & Eric); other solutions, other tricks (D2R)

  14. URI’s Can’t we just convince everyone to use just one? • Absolutely. HL7, an ANSI and ISO standards body has already established a URI scheme for Healthcare terminology: • SNOMED CT: urn:oid:2.16.840.1.113883.6.96 • Absolutely. The UMLS has already registered identifiers in the MRSAB table, which can be combined with an appropriate base such as: • SNOMED CT: http://www.nlm.nih.gov/research/umls/sab/SCT • Absolutely. The CAP is (was) clearly the owner and should assign the official SNOMED CT URI: • SNOMED CT: http://www.cap.org/snomedct • Absolutely. IHTSDO is clearly the owner and should assign the official SNOMED CT URI • SNOMED CT: http://ihtsdo.org/snomedct

  15. URI’s • Absolutely. Everyone should use sharedname.org (or sharedterm.org) (or bioportal.org) because it is more better… • SNOMED CT:http://sharedname.org/ontology/SCT • SNOMED CT: http://bioportal.org/ontology/SCT • Absolutely. The problem has been solved. • SNOMED CT: http://pds.portaldoors.net/snomedct …

  16. URI’s Unique URI’s are not going to happen any time soon • Many variants are already in use • There is still no agreement of formation principles • Owner vs. dispatcher (ihtsdo vs. sharednames) • Dereferencable vs. opaque (urn:oid vs. http://...) (Arguably) the “winner” will be the resource that recognizes this…

  17. SharedTerms Function 1: Map a URI into a “canonical” URI f(URI)  URI Examples: http://sharedterm.org/uri/urn:oid:2.16.840.1.113883.6.96 303: (?) → http://ihtsdo.org/snomedct http://sharedterm.org/uri/http://www.cap.org/snomedct 303: (?) → http://ihtsdo.org/snomedct http://sharedterm.org/uri/http://ihtsdo.org/snomedct 303: (?) → http://ihtsdo.org/snomedct

  18. SharedTerms Decision as to what is “canonical” is still an issue, but not actually that important as… http://sharedterm.org/uri/urn:oid:2.16.840.1.113883.6.96 → http://ihtsdo.org/snomedct → http://sharedterm.org/ontology/http://ihtsdo.org/snomedct → http://bioportal.org/... (record of snomedct) … what we really need is • The resource that describes SNOMED • Identity testing

  19. The class/entity/”concept code” issue A “concept” in SNOMED-CT http://ihtsdo.org/snomedct#12345678 Needs to be mapped to a resource entry http://bioportal.org/4126/12345678 The trick is separating the code from the namespace • “#”, “/”, “:” rules work most of the time • but there are still exceptions • Embedded slashes • Hierarchical identifiers • “#” as part of the URI vs. identifier • …

  20. Reference • http://sharedname.org/w/index.php?title=OntologyAccess

More Related