920 likes | 1.08k Views
Towards an Ontological Treatment of Disease and Diagnosis. Barry Smith New York State Center of Excellence in Bioinformatics and Life Sciences University at Buffalo.
E N D
Towards an Ontological Treatment of Disease and Diagnosis Barry Smith New York State Center of Excellence in Bioinformatics and Life Sciences University at Buffalo http://ontology.buffalo.edu/smith
Anders Grimsmo, “Patients, diagnoses and processes in general practice in the Nordic countries. An attempt to make data from computerised medical records available for comparable statistics”Scandinavian Journal of Primary Health Care, 2001 • “The major obstacle to extracting more epidemiological data from computerised medical records is caused by information in the databases not being uniquely linked to episodes of care.” http://ontology.buffalo.edu/smith
What is to be linked with what? What is information in the databases about? To answer this question (to assign numbers to discrete entities), we need a good ontology of the care domain, including episodes of care on the one hand and entities on the side of the patient on the other. http://ontology.buffalo.edu/smith
and we need to take account of context – of multiple diseases – of the patient’s style of life – of the patient’s environment – of specific aspects of the presentation http://ontology.buffalo.edu/smith
we do this by paying attention to natural language but the more we succeed in this, the more difficult it is to aggregate the data disease of UMLSitis http://ontology.buffalo.edu/smith
Buffalo Longitudinal Cancer Data Even with the best of intentions, and even if we just use one coding system, results are not always what they seem Problem of SNOMEDitis with acknowledgements to NLM: 1R21LM009824-01A1
SNOMED CT: Anaplasmamarginale (organism) with acknowledgements to NLM: 1R21LM009824-01A1
infectious agent is_a navigational concept with acknowledgements to Werner Ceusters NLM: 1R21LM009824-01A1
infectious agent is_a navigational concept
with acknowledgements to NLM: 1R21LM009824-01A1
with acknowledgements to NLM: 1R21LM009824-01A1
with acknowledgements to NLM: 1R21LM009824-01A1
with acknowledgements to NLM: 1R21LM009824-01A1
Why does SNOMED change so much? Problems with ‘concept’ no real coherence as to what SNOMED is representing
Why does SNOMED change so much? No proper hierarchy (of more and less general) Confusion of disorders (continuants) with etiological and diagnostic processes (occurrents) and of both with information entities (‘findings’) Confusion of ‘disorders’ with ‘morphological abnormalities’
SNOMED CT 128477000 Abscess (disorder) 44132006 Abscess (morphologic abnormality)
Epistemology and Combinatorial Explosion • Epistaxis/nosebleed • Epistaxis (disorder) • Nosebleed/epistaxis symptom (finding) • On examination - epistaxis (disorder) • Has nosebleeds - epistaxis (disorder) • Evidence of recent epistaxis (finding) from Bill Hogan
Epistemology and Combinatorial Explosion • Rash • Cutaneous eruption (morphologic abnormality), with synonym Rash • Eruption of skin (disorder), with synonym Rash • Complaining of a rash (finding) • On examination - a rash (finding) • Dry skin • Dry skin (finding) • Complaining of dry skin (finding) • On examination - dry skin (finding) • Dry skin dermatitis (disorder) from Bill Hogan
An Alternative: Basic Formal Ontology • 360 BC: Aristotle’s Metaphysics • 1879: Invention of modern logic (Boole, • Frege) • 1920: The problem of the Unity of Science (Logical Positivism) • 1940 Birth of computing (Turing) http://ontology.buffalo.edu/smith
Ontology Timeline • 1970: AI, Robotics (J. McCarthy, P. Hayes) • 1980: KIF: Knowledge Interchange Format • 1990: Description Logics • 2000: Semantic Web (OWL), Protégé • 2007: National Center for Biomedical Ontology (NCBO) Bioportal http://ontology.buffalo.edu/smith
Ontology Timeline • 1990: Human Genome Project • 1999: The Gene Ontology (GO) – Model Organism Research • 2005: The Open Biomedical Ontologies • (OBO) Foundry • 2010: Ontology for General Medical • Science http://ontology.buffalo.edu/smith
The GO is a controlled vocabulary for use in annotating data • multi-species, multi-disciplinary, open source • contributing to the cumulativity of scientific results obtained by distinct research communities • compare use of kilograms, meters, seconds … in formulating experimental results
NIH Mandates for Data Sharing Organizations such as the NIH now require use of common standards in a way that will ensure that the results obtained through funded research are more easily accessible to external groups. ODR will be created in such a way that its use will address the new NIH mandates. It will designed also to allow information presented in its terms to be usable in satisfying other regulatory purposes—such as submissions to FDA. http://ontology.buffalo.edu/smith
GO provides answers to three types of questions: for each gene product (protein ...) • in what parts of the cell has it been identified? Cell Constituent Ontology • exercising what types of molecular functions? Molecular Function Ontology • with what types of biological processes? Biological Process Ontology
= part_of • = subtype_of • Gene Product • Associations
$100 mill. invested in literature curation using GO over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO ontologies provide the basis for capturingbiologicaltheories in computable form in contrast to terminologies and thesauri – which focus on socially diverse uses of language – the GO methodfocuses on commonlysharedresults of basic biological science
A new kind of biological researchbased on analysis and comparison of the massive quantities of annotations linking ontology terms to raw data, including genomic data, clinical data, public health dataWhat 10 years ago took multiple groups of researchers months of data comparison effort, can now be performed in milliseconds
The GO covers only generic (‘normal’) biological entities of three sorts: • cellular components • molecular functions • biological processes It does not provide representations of diseases, symptoms, genetic abnormalities … How to extend the GO methodology to other domains of biology and medicine?
OBO Foundry ontologies all follow the same principles to ensure interoperability • GO Gene Ontology • ChEBI Chemical Ontology • PRO Protein Ontology • CL Cell Ontology • ... • OGMS Ontology for General Medical Science
Basic Formal Ontology: GO at a high level http://ontology.buffalo.edu/smith
Basic Formal Ontology (BFO) A simple top-level ontology to support information integration in scientific research No abstracta Nothing propositional Clear hierarchy No overlap with domain ontologies No confusion of ontology with epistemology No confusion of terms with what terms represent in reality
Basic Formal Ontology Continuant Occurrent (Process, Event) Independent Continuant Dependent Continuant http://ifomis.uni-saarland.de/bfo/
BFO and the 3 Gene Ontologies (GO) Continuant Occurrent Biological Process Independent Continuant Dependent Continuant Cell Component Molecular Function Kumar A., Smith B, Borgelt C. Dependence relationships between Gene Ontology terms based on TIGR gene product annotations. CompuTerm 2004, 31-38. Bada M, Hunter L. Enrichment of OBO Ontologies. J Biomed Inform. 2006 Jul 26
Users of BFO NCI BiomedGT SNOMED CT Ontology for General Medical Science (OGMS) ACGT Clinical Genomics Trials on Cancer – Master Ontology / Formbuilder (Case Report Forms for Cancer Clinical Trials)
Users of BFO MediCognos / Microsoft Healthvault Cleveland Clinic Semantic Database in Cardiothoracic Surgery Major Histocompatibility Complex (MHC) Ontology (NIAID) Neuroscience Information Framework Standard (NIFSTD) and Constituent Ontologies
Users of BFO Interdisciplinary Prostate Ontology (IPO) Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research Neural Electromagnetic Ontologies (NEMO) ChemAxiom – Ontology for Chemistry Ontology for Risks Against Patient Safety (RAPS/REMINE) (EU FP7) IDO Infectious Disease Ontology (NIAID)
Infectious Disease Ontology Consortium • MITRE, Mount Sinai, UTSouthwestern – Influenza • IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus) • Colorado State University – Dengue Fever • Duke University – Tuberculosis, Staph. aureus • Case Western Reserve – Infective Endocarditis • University of Michigan – Brucellosis
The OBO Foundry • GO Gene Ontology • CL Cell Ontology • SO Sequence Ontology • ChEBI Chemical Ontology • PATO Phenotype (Quality) Ontology • FMA Foundational Model of Anatomy • ChEBI Chemical Entities of Biological Interest • CARO Common Anatomy Reference Ontology • PRO Protein Ontology • Infectious Disease Ontology • Plant Ontology • Environment Ontology • Ontology for Biomedical Investigations • RNA Ontology