660 likes | 1.12k Views
The Use of Axiomatically-Rich Ontologies in Biomedical Research. MATHIAS BROCHHAUSEN, SEPTEMBER 02, 2019 ONTOBRAS 2019 Porto Alegre, Brazil. Disclaimer. Today I will be talking about semantic technologies and biomedical data.
E N D
The Use of Axiomatically-Rich Ontologies in Biomedical Research MATHIAS BROCHHAUSEN, SEPTEMBER 02, 2019 ONTOBRAS 2019 Porto Alegre, Brazil
Disclaimer • Today I will be talking about semantic technologies and biomedical data. • I will mention many (biomedical) informatics tools, such as terminologies, Common Data Models. I will say that they are not sufficient to provide what I understand under semantics (stay tuned for definition), but that does not mean I think they are useless or irrelevant in general. • I will also talk about how I believe ontologies are the answer to the question, but…
EHR Workflows meet Consistency • Core vital signs: Blood Pressure, Height & Weight • Blood Pressure: 113 unique BP Names: • 15 have been deleted • 45 are hidden • 52 are in available: • 37 are in use (have values) • 29 have been used more than a thousand times • 14 has been used on less than 71 patients • 23 have been used on more than 371 patients CHCO Slides from Maggie Massary. Used with permission
BP – 2016 Only CHCO Slides from Maggie Massary. Used with permission
Why an ontology, when we have terminologies? SNOMED ICD 9 NDF-RT LOINC UAMS RxNorm
Terminologies saving the day? • Studies on subsequent coding using SNOMED CT proved that inter- and intra-coder equivalence is never better than 58%. • This means 42% of the data end up irretrievable! • J.E. Andrews, R.L. Richesson, J. Krischer, “Variation of SNOMED CT coding of clinical research concepts among coding experts,” J Am Med Inform Assoc 2007, 14, 4, p. 497-506. • M.F. Chiang, J.C. Hwang, A.C. Yu, D.S. Casper, J.J. Cimino, J. Starren, “Reliability of SNOMED-CT coding by three physicians using two terminology browsers,” AMIA 2006 Symposium Proceedings, 2006, p. 131-135.
Role of Common Data Models (CDMs) • Provide a representation of data typically collected about a domain. • Used to standardize and facilitate the exchange, sharing, and storing of data. • In medicine they greatly support uniformity of data collection and maintenance in multicentric trials. Garza, Maryam, et al. "Evaluating common data models for use with a longitudinal community registry." Journal of biomedical informatics 64 (2016): 333-341.
Selected CDMs (based on Garza et al.) • CDISC SDTM1 • OMOP2 • PCORnet Common Data Model3 • Sentinel4 • CTSA ACT (Accrual to Clinical Trials)5 1https://www.cdisc.org/standards/foundational/sdtm2https://www.ohdsi.org/data-standardization/the-common-data-model/3http://pcornet.org/pcornet-common-data-model/ 4https://www.sentinelinitiative.org/sentinel/data/distributed-database-common-data-model5https://ncats.nih.gov/pubs/features/ctsa-act
Ascribing semantic capabilities to CDMs "While the model predates standards for modeling EHR data such as OMOP, the PCORnet common data model and the ACT data model, it mirrors their structure and semantics closely." Post, et al. Metadata-driven Clinical Data Loading into i2b2 for Clinical and Translational Science Institutes, AMIA Summits on Translational Science Proceedings2016 (2016), 184-193.
Ascribing semantic capabilities to CDMs "Adopting a common or reference data model lays the groundwork for achieving syntactic and semantic interoperability so that comparable CER [comparative effectiveness research] analyses can be performed across research study sites." Ogunyemi, et al. Identifying appropriate reference data models for comparative effectiveness research (CER) studies based on data from clinical information systems, Medical care51 (2013), S45-S52.
Human-readable from: OMOP CDM
Machine-readable Prior F, Sharma A, Almeida J, Bennett W, Bona J, Bosch W, Bremer E, Brochhausen M, Fitzgerald TJ, Kurc T, Nolan T, Smith K, Tarbox L, Saltz J, “Semantic Integration of Non-image Data in TCIA Collections and PRISM Platform,” Abstract and poster presentation, Informatics Technology for Cancer Research (ITCR) Annual Meeting, Bethesda, MD, May 23, 2018.
But why do we need computers to help us? • Because using terminologies or CDMs alone does not ensure that the terms or data elements are used consistently to code or capture the same phenomena. • Because we have no automated way to validate coding or data entry. • Because there still is a plethora of different terminologies, data elements, etc. out there that is used, which hinders data integration. Overcoming this problem needs mapping. • …which is begging the question of how can we ensure semantic integrity when creating a mapping?
Semantic Interoperability • “the different systems involved in data exchange can understand and use the exchanged data and information” • Ogunyemi, Omolola I., et al. "Identifying appropriate reference data models for comparative effectiveness research (CER) studies based on data from clinical information systems." Medical care 51 (2013): S45-S52.
When does a system understand? • Ability to categorize the individuals in the domain of discourse into meaningful categories based on the properties of the individuals and the definitions of the categories using an effective method. • An effective method is a method that allows us to compute the answer to a given problem in a finite number of steps and is logically bound to give the right answer (and no wrong answers). • G. Hunter, Metalogic: An introduction to the metatheory of standard first order logic, Univ of California Press, 1973. • Computable semantics using formal languages with effective methods for inference.
DIDEO Evidence Categorization • Drug-drug Interaction and Drug-drug interaction Evidence Ontology (DIDEO)1 • Ontology developed to facilitate managing information about potential drug-drug interactions (PDDI) from multiple sources, including clinical trials, case studies, in vitro experiments. • A Web Ontology Language (OWL2) version is freely available 1 https://github.com/DIDEO 2 http://purl.obolibrary.org/obo/dideo.owl
Ontology-assisted Categorization Utecht et al.: Formalizing evidence type definition for drug-drug interaction studies to improve evidence base curation. Stud Health Technol Inform. 2017, 245: 960-4
Defining evidence types Created by Joseph Utecht, used with permission
Conclusions • Machine interpretable/useable semantics facilitate automated reasoning that can help deal with the data deluge. • CDMs are commonly used to represent clinical data, but they don’t provide these semantics.
What is an ontology? An ONTOLOGY is a representation of reality, its entities and their interrelations in a specified domain of interest. It needs to be implemented in a machine-interpretable way. inspired by: Terry Pratchett: Wintersmith. Doubleday, 2006
Axioms in ontologies Proton pump Chemical substance Hydrogen potassium ATPase is a inhibits is a Chemical substance & inhibits some Proton pump Omeprazole Proton-pump inhibitor is a
Open Biomedical Ontologies Foundry • http://www.obofoundry.org • A community that builds and distributes biomedical ontologies that are compliant with the same set of principles. • Among many others, it includes the most widely used ontology, the Gene Ontology. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al: The OBO Foundry: Coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 2007, 25(11):1251–5.
An intra-institutional silo Institution X • edta_plasma • buffy_edta • nacit_plasma • buffy_plasma • plasma • buffy • This biobank doesn't use NaCit, but only EDTA. Biobank A Biobank B
Consistency in biobank data automatic inference amended from: Brochhausen M, Zheng J, Birtwell D, Williams H, Masci AM, Ellis HJ, Stoeckert CJ Jr. OBIB – a novel ontology for biobanking. J Biomed Sem (2016) 7:23
Our vision for biobanks • Developing a community-driven ontology to provide a unified terminology to facilitate sharing data cross different biobanks (intra- and inter-institutional) • Facilitating integrating Electronic Health Record data and other clinical data with specimen information • Preventing temporal silos by providing more flexible data management • Schema-free data management • Allowing addition of new parameters without changing the way data is stored • Brochhausen M, Fransson M, Kanaskar N, Eriksson M, Merino-Martinez R, Hall RA, Norlin L, Kjellqvist S, Hortlund M, Topaloglu U, Hogan WR, Litton JE. Providing a semantically rich ontology for sharing biobank data based on Minimum Information About BIobank data Sharing. J Biomed Semantics. 2013 Oct 8;4(1):23. PMID: 24103726. • Brochhausen M, Zheng J, Birtwell D, Williams H, Masci AM, Ellis HJ, Stoeckert CJ Jr. OBIB – a novel ontology for biobanking. J Biomed Semantics. 2016 May 2;7:23. PMCID: PMC4855778
OBIB • Ontology for Biobanking • http://purl.obolibrary.org/obo/obib.owl • Lead curators: M. Brochhausen, C. Stoeckert, J. Zheng, • Contributors: M. Anderberg, M. Eriksson, M.N. Fransson, W.R. Hogan, S. Kjellquist, J.E. Litton, R. Merino-Martinez, L. Nordin, A. Nzinga,
Provenance of OBIB classes Figure provided by J Zheng and C.J. Stoeckert Jr. (University of Pennsylvania)
Our vision To enable this we need to move forward developing the Informed Consent Ontology cover uses cases of this type. Where can I find lung tissue stored at cryogenic temperature with which I can conduct genomics studies
Informed Consent Ontology • http://purl.obolibrary.org/obo/obib.owl • uses pre-existing OBO Foundry ontologies (OBI). • recently, started using an ontology modeling rights and obligations (d-acts) and the processes that give rise to them Lin Y, Harris MR, Manion FJ, Eisenhauer E, Zhao B, Shi W, Karnovsky A, He Y: Development of a BFO-based Informed Consent Ontology (ICO). In: The 5th International Conference on Biomedical Ontologies (ICBO): 2014; Houston, Texas, USA, October 8-9, 2014. CEUR Workshop Proceedings; 2013: Page 84-86. [http://ceur-ws.org/Vol-1327/icbo2014_paper_54.pdf]
Informed Consent Ontology The initial ICO:
In order to address our use cases, it is not enough to track informed consent documents. • We need to be able to represent and track the rights and obligations related to each specific informed consent form. • In order to do so, we need a theory of how these rights and obligations are related to informed consent process and the forms involved.
The informed consent process Participant signs informed consent template bestows with certain rights and obligations
Document Act Ontology (d-acts) • http://purl.obolibrary.org/obo/iao/d-acts.owl • Associated with the OBO Foundry • OWL implementation of the theory of document acts.
document act permission role subClassOf subClassOf informed consent act of consenting right to perform genomic study type type type is specified output of part of is specified output of type type document act input document document informed consent document subClass Of subClass Of
assay permission role subClassOf subClassOf genomic experiment right to perform genomic study type type permits inheres in inheres in type type principal investigator role human being
Effect of using d-acts • Based on the provisions of the data use agreement, protocols, and informed consent template we can annotate which individual's rights and obligations are created once the template is signed. • Once that is done, we can query for all specimens associated with an informed consent allowing genomic studies or both.
Collecting RDF data • Can we build a tool that looks and behaves like a simple questionnaire tool (SurveyMonkey, etc.), but creates RDF data as participants fill in information? • Can we us the data participants provide for a real time preliminary comparison of their answers to others?
The CAFÉ project https://cafe-trauma.com