500 likes | 692 Views
Terminology Curation with the Semantic MediaWiki. Harold Solbrig Informatics Architect Apelon, Inc. The Primary Task. Evaluate the roles, categories and organization of the National Cancer Institute (NCI)’s Cancer Thesaurus with respect to: Upper Level Ontological Principles
E N D
Terminology Curation with the Semantic MediaWiki Harold Solbrig Informatics Architect Apelon, Inc. Terminology and the Semantic MediaWikiEcoterm IV – Vienna 17 – 18 April 2007
The Primary Task Evaluate the roles, categories and organization of the National Cancer Institute (NCI)’s Cancer Thesaurus with respect to: • Upper Level Ontological Principles • ISO TC37 & Related principles As with Ontology construction, it was understood by all parties that this was a process – not a goal.
Approach • Gather appropriate upper level ontologies (BFO, Dolce, Top Bio, UMLS Semantic Net and OBO Relations Ontology) into a single, readily referenced format • Load NCI Thesaurus into same format • Multiple parties review, annotate, recommend and categorize • Publish, analyze and evaluate results
Solution By using the Semantic MediaWiki (SMW), we were able to accomplish all of the goals in a (very) reasonable period of time
Discussion We also discovered that, with some extensions, the SMW could be useful for publishing, annotating and cross-referencing other terminological (and other..) resources.
Questions? … just kidding.
Wiki’s • Community developed • Collaborative • “Organic” – to the very core… • Primary focus (to date) is human consumption • Traceable, provenance automatically recorded, differences, undo and redo.
MediaWiki • http://en.wikipedia.org/wiki/Wiki • Base for WikiPedia and many others… • Key characteristics • Web based editing • Page links • Categories • Templates
MediaWiki • Fully documented using (surprise!) mediawiki • Rich mechanisms for discussion, curation, export, etc.
Common constructs • [[Train Transport]] – hyperlink to page named “Train_Transport” • ‘‘Italic’’, ‘‘‘Bold’’’ • * Bullet point • [http://www.w3c.org/ The W3C] – hyperlink • … and much more
Sample Template Extension call Parameter
Semantic MediaWiki 3 Key extensions to MediaWiki • Categories == Class • PageA … [[Category:X]] pageA rdf:Type category:X • Category:Y … [[Category:X]] category:Y rdfs:subClassOf category:X • Links == Role • PageA … [[PageB]] PageA …[[hasPart::PageB]] • Attributes == DataProperty • [[population:=32,154,773]] • Includes datatypes
Semantic Rendering RDF (!) Relation Attribute Value Type (or superClass)
Templates? ; Gene_Product_Is_Biomarker_Type : The role is used to designate the type of … Kind: [[:Category:NCI_Kind]] ‘‘‘Semantic Type:’’’ [NCI_Semantic_Type::Category:SN_Conceptual_Entity|Conceptual Entity] Brittle, not readily changed…
Templates? {{OntylogDescription|ns=NCI|text=“The role is used to designate…”}} {{Kind|ns=NCI|target=Kind}} {{ResourceRef|name=Semantic_Type|ns=NCI|target=Conceptual_Entity|targetns=SN}} Can readily be updated viat template…
Link to another NCI comment Link to external Ontology Categorization in external Ontology Commentary
How is it Working? Very well!
Terminology • Centrally curated • Central to the practice of medicine • Insurance and reporting • Regulatory • Research • Clinical Practice • Information Sharing • ICD-9, CPT-4, SNOMED, …
Clinical Terminology • Quality and content is important • Needs central vetting, integration, qa • Central model doesn’t scale • Need input from (many) experts • Need visible, active feedback loop
Terminology Workflow 1995 Books PDF Distribution (3) Controlled Terminology Lists and Tables (2) (1) Curation (4)
Terminology Workflow 1995 Books PDF Distribution (3) Controlled Terminology ‘B’ (2) Lists and Tables (1) Curation
Terminology Workflow 2008 (3) Common Distribution Model Distribution Controlled Terminology (2) (4) Online Services (1) Curation (5)
Terminology Workflow 2008 (3) Controlled Terminology B Common Distribution Model Distribution Controlled Terminology (2) (4) Online Services (1) Curation (5)
Common Distribution Model • LexGrid • (a little bit of…) OWL • NCI Thesaurus & SNOMED CT • Still requires LexGrid-like additions • “Pushing the envelope” • UMLS RRF • Although underspecified as a ‘model’
Online Services • OMG Terminology Query Services • Not heavily used • Perceived (incorrectly) as CORBA specific • Perceived as too complex • Object oriented and stateful • ANSI Common Terminology Services • Being adopted • Necessary but not sufficient • Stateless • CTS-2 • Co-development beginning w/ HL7 & OMG
Online Services • LexBIG • LexGrid for the Bio Informatics Grid • Robust query specification • Meets many end-user (developers) requirments • Not simple to implement – it actually adds value • Not a standard - but will be used to guide CTS-2
Workflow and Feedback (3) Common Distribution Model Distribution Controlled Terminology (2) (4) Online Services (1) Curation (5)
The Feedback Component Curation
The Feedback Component Common Distribution Model Semantic MediaWiki (++) Distribution Online Services Annotations and Change Requests Community Review Version Staging Curation
Issues and Next Steps (1) SHARED Semantics • {{Definition|…}} • {{Synonym|…}}} • {{References|…}} • {{DLSome|…}} • {{DLAll|…}} • … 12620 anyone?
Issues and Next Steps (2) Figure out namespaces • NCI:Activity, AgroVoc:Fish, … • NCI_Activity, AgroVoc_Fish • ??? (2a) Identifiers (Activity vs. C12345) (2b) Versions (2c) URI’s (vs. URL’s) • Internal • External
Certification and Sanctioning • Who can edit? • Who can validate? • Who selects updates? • … (see: http://en.citizendium.org/wiki/Main_Page
Automatic Export • Selecting sets of updates • Formatting update recommendations for target curators, etc…
Synchronization • Changes implemented in terminology • Update wiki pages • Say what changed • What changes are incorporated by value? By reference?
Approach and Responsible Parties Shared Semantics • Core set based on LexGrid & OWL • Post on WIKI and link on SMW site • Assigned to Apelon, Mayo, NCI, ??? • Extend to OBO, SKOS (?), XMDR… • Connections to 12620
Time Frame and Assignments URI’s, namespaces, naming • UK NCR (CancerGrid) – looking at unAPI and servers • (Hopefully) can provide URI resolver svc. • Short term – use templates / extensions
Content • SNOMED-CT, ICD-9-CM, many, many others are already available via. Apelon DTS Services • Available soon • FMA, HL7 Version 3 Terminology, OBO Foundry (GO, PATO, etc) as time permits • Others as needed (and funded…)
What we’ve got to date • Apelon DTS Server Extension • Includes both defined and classified view (!) • Export in restful XML (currentely Apelon, soon to be LexGrid) • XMDR Export Format • Protégé (Native and OWL 3.2) prototype • Done by Mayo • Both import and export • Still needs templates
Questions? • This time for real Note: SMW will be made externally available (w/ simple password) once we get contract specific info cleaned up (NCI will probably publish shortly)… contact: hsolbrig@apelon.com for access.