260 likes | 274 Views
Collaborative development of a proof of concept to bridge clinical practice with research using semantic web specifications. Re-using existing standards and terminologies for wider adoption and buy-in from stakeholders.
E N D
Feedback from: • Jyotishman Pathak • Dan Russler • Matt Moores • Alan Ruttenberg • Parsa Mirhaji • Lee Feigenbaum • Ronan Fox • Rachel Richesson • Susie Stephens • Eric Prud’hommeaux
Goal • To demonstrate the value of semantic web (SW) specifications in bridging the divide between clinical practice and clinical research • Collaborative development of a proof of concept (POC) that demonstrates key value propositions of using SW specifications • To get buy-in from a wide variety of stakeholders as a prelude to acceptance and adoption of semantic web specifications • Get buy-in for the use case • Get wide participation and involvement of the key stakeholders in various stages of analysis, design and development of the POC. • Re-use existing standards, terminologies, data and information models of existing communities to increase the probability of adoption.
Methodology The group has been functioning in a consensus driven manner where opinions are sought at each step from all the stakeholders and a decision is taken based on the consensus so created. It was realized that a critical success factor was to incorporate the views of various communities at each step.
Decision 1: Use Case • Use Case Development was lead by Rachel Richesson • Wide variety of use cases were investigated and discussed • Patient Recruitment • Adverse Event Detection • Tracking Patient Through a Clinical Trial • Decision: Focus on Patient Recruitment • Data was assumed to be in an EMR.
Re-use of existing Information Models • How can we re-use EMR data for Clinical Research? • HL7/RIM/DCM descriptions may be viewed as a “format” for Clinical Research Data. • Typically clinical data in healthcare delivery systems and applications is represented or transformed into this “format” • CDISC/SDTM description may be viewed as a “format” for Clinical Research questions. • Typically clinical data in clinical trials systems and application is represented or transformed into this format? • Can we ask questions in one “format” when the data represented in another “format”? • How can we implement functionality to map across these “formats”? • The mapping module should be flexible to incorporate extensions in the “formats” • The mapping module should be flexible to “plug and play” with multiple “formats”
Decision 2: Information Models • Wide variety of Information Models were considered • HL7/RIM • CDISC/SDTM • Detailed Clinical Models from Intermountain Healthcare • Galen • POMR Ontology (Chimezie) • Eligibility Criteria Ontology (Helen) • Healthcare Delivery Encounter-based Meta Model (Parsa Mirhaji) • Conclusions • No one ontology/information model is likely to fit the bill • Align as closely as possible to existing information model and terminology standards as possible • Identify gaps and inadequacies in addressing the use case at hand. • Provide feedback to standards groups: CDISC, HL7/RIM, BRIDG • Decision: Use CDISC/SDTM, Detailed Clinical Models, HL7/RIM as “seed” ontologies to begin with • Iteratively refine them as gaps and inadequacies are discovered
Demonstrate Re-use • Re-use of data from the EMR for Clinical Research • Re-use of existing vocabularies, e.g., NCI Thesaurus, Snomed, MedDRA • Re-use of pre-existing information models e.g., HL7/RIM/DCM, SDTM • Identify and Re-use software components that can be used to enable a wide range of use cases • Patient Recruitment • Adverse Drug Event Detection • Tracking a Patient through a Clinical Trial • Develop the POC based on an implementation of these re-usable components • Components that implement mapping • Components that implement data retrieval • Components that implement wrappers/trasnformations • Components that implement checking for elgibility criteria, adverse events and other clinical events of significance.
Decision 3 • Decided to implement POC on a real world data set as opposed to a synthetically created data set. This raised the following issues: • What would be an appropriate “seed’ Information Model/Ontology to describe healthcare data based on current state. • What are appropriate terminologies (e.g., Snomed, LOINC, RxNorm) that need to be considered to capture coded information in healthcare data based on current state • Parsa Mirhaji provided the data and his feedback was crucial in identifying the appropriate “seed” model/terminology
Decision 4 • Based on discussions with W3C folks such as Ralph Swick, Karen Myers, Steve Bratt, Eric P. • W3C is interested in working with external standards bodies such as HL7 and CDISC and express their content using Semantic Web specification such as RDF and OWL – Steve Bratt at the Bio IT World Luncheon • Implication about W3C being a content neutral • Bron Kisler emphasized that since W3C is providing only the languages, a collaboration would be synergistic and would make sense • Is it possible to develop a collaborative interest group with involvement of HL7, CDISC and others – conversation with Ralph Swick
One proposed Solution Architecture Protocol Specification Interface Mapping Module RDF Transformation Engine Eligibility Checking Module CTMS EMR System
Decision 5 • Current State Assumptions • Information Models and Vocabularies used in Clinical Trials Context are different from those used in the Healthcare Delivery context • Emphasis on the mapping aspect • Support Plug and Play of different Information Models and Vocabularies • Technology Choices: • SPARQL • N3 rules
Mapping Module • Critical component of the key goal of this effort. • i.e., To gain acceptance from a wide variety of stakeholders in the healthcare and clinical trials space. • HL7/RIM/DCM – seek alignment with healthcare standards • CDISC/SDTM – seek alignment with clinical trials standards • Develop Mappings across these two models • Identify limitations and gaps across these models • Scope: • Focus only on those data items that are required for patient recruitment • Focus only on those data items that are related to diabetes and hypertension • To be driven in some part by “mock” diabetes and hypertension records
Use Case Step Through • Clinical Trial Administrator uses the Protocol Specification Interface to specify the eligibility criteria. The data items are specified using elements from the SDTM model. • The mapping module translates the data items to the appropriate HL7/RIM/DCM representation. • Appropriate queries are made to the Mediator/Gateway module. • The Mediator/Gateway module translates the query into the underlying database query language. The query is executed at the database and sent to the mapping module. • The mapping module retranslates the data into terms from the SDTM model. • The Eligibility Checking Module checks which patients satisfy the eligibility criteria. • The selected patients are returned to the Clinical Trial Administrator Note: Some eligibility criteria may not be expressible using SPARQL queries and may required rules, etc.
Next Steps: Narrow Scope for Implementation • Choose a protocol for implementation #8 (second one) • Limit Scope to Medications, Lab Tests and Vital Signs • Develop Clinical Trials Ontology and Clinical Practice Ontology • Iterative development • Alignment with standards as closely as possible • Implement RDF data store based on data requirements and mock patients • Implement Mapping module using N3 rules • Implement Eligibility checking module using SPARQL • Try to demonstrate another use case for Adverse Drug Event Detection.
Specification of Eligibility Criteria • Assume we will use an ontology or rule-based tool to specify eligibility criteria • Open to NLP/Ontology-based approaches that translate free text clinical protocol specifications that transform these into a structured form • Examples: • Type 1 diabetes and/or history of ketoacidosis • History of long-term therapy with insulin (>30 days) within the last year
Eligibility Criteria Specification • The functional requirements for this need to be identified and spec’ed out. For e.g., • Temporal Constraints • Trends on clinical data and values • … • Out of Scope for POC. • May want to see if the CT or HC communities have done some work on standards for specifying eligibility criteria. • Out of Scope for POC
Design Choice: Eligibility Criteria as a “layer” around Data Items • Data Items • Problem: Type 2 Diabetes • History of Problem: Ketoacidosis • History of Therapy • Name: Insulin • Length: X days • Time Period: [Date1, Date2] • Eligibility Criteria: • Rule conditions • Patient has Type 2 Diabetes • Patient has History of Ketoacidosis • Patient has History of Therapy: • Name = Insulin • X > 30 days • Time Period < 1 year
Mappings: Goal/Methodology • Characterize the various data items required for patient recruitment (modulo scope) List of requirements on the data content Tab • http://spreadsheets.google.com/ccc?key=pINNryLt_vyDiPyHj11WiDg&hl=en_US&pli=1 • For each data item do the following: • Identify the RIM/DCM construct(s) that models that data item DCM column under Models • Identify the SDTM construct(s) that models that data item SDTM column under Models • Identify the terminologies that model some of the values required Terminology Columns including Snomed, MedDRA and NCI Thesaurus • Identify the data types and values that characterize the values of some of the data items Data Types and Units columns including those for RIM and SDTM • We will be considering various constructs of HL7/RIM, Detailed Clinical Models and other models in conjunction
Consider a Data Item Example: • History of Therapy • Name: Insulin • Length of Therapy: 100 days • StartDate: Date • EndDate: Date
Mapping Methodology • Identify Information Model Elements • Therapy => • SubstanceAdministration (HL7/RIM) • effectiveTime • statusCode • Medication (subClass of ManufacturedMaterial, HL7/RIM) • Specific type of Participation called Consumable (HL7/RIM) • Insulin => • Medication.Name (HL7/RIM) • Identify Controlled Vocabularies • Medication.Name => Controlled Vocabulary RxNorm (also known as Terminology Binding) • Identify Data Types • Dates and Times => TS data type in HL7 • Identify Units • Included in the definition of data types … taken from the UCUM standard
Mapping Methodology (Continued) • Mappings between Information Model elements; • SystolicBP VSTEST, VSTESTCD = SYSBP • Mappings between controlled vocabularies: • SystolicBP “Some Snomed Concept” • SYSBP “Some NCI Thesaurus Concept” • “Some Snomed Concept” “Some NCI Thesaurus Concept” • Between Data Types and Units • HL7:PQ VSRESU
Design Choice: Leverage Existing Implementations and Systems • SHER System • Re-use the Reasoner to compute eligibility criteria • Semantic DB System • Re-use the NLP parser (if available) to parse the textual representation of the clinical trials criteria into structured queries, rules, whatever
Technical: Eligibility Criteria in OWL Patientthat (hasProblem some DiabetesType 1 or hasHistory some Ketacidosis)and hasTherapy (some Therapy that hasLength all int[>30] and hasTimePeriod all int[< 365]) Just for illustration purposes … need thorough and detailed analysis to get it right.
Technical Design: Eligibility Criteria using Rules IF (the_patient.hasProblem = DiabetesType 1 OR the_patient.hasHistory = Ketacidosis) AND the_patient.hasHistory.name = Insulin AND the_patient.hasHistory.length > 30 AND the_patient.hasHistory.timePeriod < 365 THEN the_patient is eligible for the clinical trial
Mapping Design Issues • Are mappings always 1-1? • Is it always possible to get synonym mappings? • What happens to these mappings when there are changes in the information models? • Are these mappings enough to enable a bi-directional flow through between EMR and Clinical trials data?