460 likes | 943 Views
Data Normalization. Dr. Stan Huff. Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James Many, many, others…. Acknowledgements. What are detailed clinical models? Why do we need them?. A diagram of a simple clinical model.
E N D
Data Normalization Dr. Stan Huff
Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James Many, many, others… Acknowledgements
A diagram of a simple clinical model Clinical Element Model for Systolic Blood Pressure SystolicBP SystolicBPObs 138 mmHg data quals BodyLocation BodyLocation Right Arm data PatientPosition PatientPosition Sitting data
A stack of coded items is ambiguous (SNOMED CT) Numbness of right arm and left leg Numbness (44077006) Right (24028007) Arm (40983000) Left (7771000) Leg (30021000) Numbness of left arm and right leg Numbness (44077006) Left (7771000) Arm (40983000) Right (24028007) Leg (30021000) Need for a standard model
Estimated Auto Manual What if there is no model? Site #1 70 % 37 Hct, manual: 70 % 35 Hct, auto : Site #2 70 % 37 Hct :
Site 1: OBX|1|CE|4545-0^Hct, manual||37||%| OBX|1|CE|4544-3^Hct, auto||35||%| Site 2: OBX|1|CE|20570-8^Hct||37||%|….|manual| OBX|1|CE|20570-8^Hct||35||%|….|auto| HL7 V2.X Messages
A single name/code and value Hct, manual is 37 % Two names/codes and values Hct is 37 % Method is manual (spun) Too many ways to say the same thing
Pre-coordinated representation <observation> <cd> Hct, manual(LOINC 4545-0 ) </cd> <value> 37 % </value> </observation> Post-coordinated (compositional) representation <observation> <cd> Hct (LOINC 20570-8) </cd> <qualifier> <cd> Method </cd> <value> Manual </value> <qualifier> <value> 37 % </value> </observation> Model fragment in XML
Isosemantic Models Precoordinated Model HematocritManual (LOINC 4545-0) HematocritManualModel 37 % data Post coordinated Model (Storage Model) Hematocrit (LOINC 20570-8) HematocritModel 37 % data quals Hematocrit Method HematocritMethodModel Manual data
Patient Identifier Patient Identifier Date and Time Date and Time Observation Type Observation Type Weight type Observation Value Observation Value Units Units 123456789 123456789 7/4/2005 7/4/2005 Hct Hct, manual manual 37 37 % % 123456789 123456789 7/19/2005 7/19/2005 Hct Hct, auto auto 35 35 % % Relational database implications If the patient’s hematocrit is <= 35 then ….
Signs, symptoms Diagnoses Problem list Family History Use of negation – “No Family Hx of Cancer” Description of a heart murmur Description of breath sounds “Rales in right and left upper lobes” “Rales, rhonchi, and egophony in right lower lobe” More complicated items:
All health care data, including: Allergies Problem lists Laboratory results Medication and diagnostic orders Medication administration Physical exam and clinical measurements Signs, symptoms, diagnoses Clinical documents Procedures Family history, medical history and review of symptoms What do we model?
EMR: data entry screens, flow sheets, reports, ad hoc queries Basis for application access to clinical data Data normalization Creation of maps from models in the local system to the standard model Target for the output of structured data from NLP Validation of data as it is stored in the database Phenotype algorithms (decision logic) Basis for referencing data in phenotype definitions Does NOT dictate physical storage strategy How are the models used?
model BloodPressurePanel is panel { key code(BloodPressurePanel_KEY_ECID); statement SystolicBloodPressureMeas systolicBloodPressureMeas optional systolicBloodPressureMeas.methodDevice.conduct(methodDevice) systolicBloodPressureMeas.bodyLocationPrecoord.conduct(bodyLocationPrecoord) systolicBloodPressureMeas.bodyPosition.conduct(bodyPosition) systolicBloodPressureMeas.relativeTemporalContext.conduct(relativeTemporalContext) systolicBloodPressureMeas.subject.conduct(subject) systolicBloodPressureMeas.observed.conduct(observed) systolicBloodPressureMeas.reportedReceived.conduct(reportedReceived) systolicBloodPressureMeas.verified.conduct(verified); statement DiastolicBloodPressureMeas diastolicBloodPressureMeas optional …. statement MeanArterialPressureMeas meanArterialPressureMeas optional …. qualifier MethodDevice methodDevice optional; md.code.domain(BloodPressureMeasurementDevice_DOMAIN_ECID); qualifier BodyLocationPrecoord bodyLocationPrecoord optional; blp.code.domain(BloodPressureBodyLocationPrecoord_DOMAIN_ECID); modifier Subject subject optional; attribution Observed observed optional; attribution ReportedReceived reportedReceived optional; attribution Verified verified optional; } Model Source Expression (CDL)
HTML Compiler XML Template - .xsd Java Class “In Memory” Form CE Source File SMArt RDF? CE Translator UML? openEHR Archetype? HL7 RIM Static Models? OWL?
Artifacts Used CDL Model Definition CEM XML Schema HL7 Data Source CEM XML Instance
StandardLabObsQuantitative - CDL Definition import StandardLabObs; import ReferenceRangeNar; model StandardLabObsQuantitative is statement extends StandardLabObs { key domain(StandardLabObsQuantitative_KEY_VALUESET_ECID); data PQ primaryPQValue unit.domain (UnitsOfMeasure_VALUESET_ECID) alternate { match CD secondaryCDValue code.domain(LabValue_VALUESET_ECID); match CD altCDValue code.domain(LabValue_VALUESET_ECID); otherwise ST altSTValue; }; qualifier ReferenceRangeNar referenceRangeNar card(0..1); constraint primaryPQValue.isNullReasonCode.domain(LabNullFlavor_VALUESET_ECID); constraint abnormalInterpretation.CD.code.domain (AbnormalInterpretationNumericNom_VALUESET_ECID); constraint deltaFlag.CD.code.domain (DeltaFlagNumericNom_VALUESET_ECID); }
StandardLabObsQuantitative - Schema Snippet <xs:complexType name="StandardLabObsQuantitative"> <xs:sequence> <xs:element name="key" minOccurs="0" maxOccurs="1" type="CD"/> <xs:element name="primaryPQValue" type="PQ"/> <xs:element name="referenceRangeNar" minOccurs="0" maxOccurs="1" type="ReferenceRangeNar"/> <xs:element name="accessionNumber" minOccurs="0" maxOccurs="1" type="AccessionNumber"/> <xs:element name="fillerOrderNumber" minOccurs="0" maxOccurs="1" type="FillerOrderNumber"/> <xs:element name="placerOrderNumber" minOccurs="0" maxOccurs="1" type="PlacerOrderNumber"/> <xs:element name="resultStatus" minOccurs="0" maxOccurs="1" type="ResultStatus"/> <xs:element name="reportingPriority" minOccurs="0" maxOccurs="1" type="ReportingPriority"/> <xs:element name="abnormalInterpretation" minOccurs="0" maxOccurs="1" type="AbnormalInterpretation"/> <xs:element name="ordinalInterpretation" minOccurs="0" maxOccurs="1" type="OrdinalInterpretation"/> <xs:element name="deltaFlag" minOccurs="0" maxOccurs="1" type="DeltaFlag"/> <xs:element name="responsibleObserver" minOccurs="0" maxOccurs="unbounded" type="ResponsibleObserver"/> <xs:element name="performingLaboratory" minOccurs="0" maxOccurs="1" type="PerformingLaboratory"/> <xs:element name="comment" minOccurs="0" maxOccurs="unbounded" type="Comment"/> <xs:element name="subject" minOccurs="0" maxOccurs="1" type="Subject"/> <xs:element name="specimenCollected" minOccurs="0" maxOccurs="1" type="SpecimenCollected"/> <xs:element name="specimenReceivedByLab" minOccurs="0" maxOccurs="1" type="SpecimenReceivedByLab"/> <xs:element name="resulted" minOccurs="0" maxOccurs="1" type="Resulted"/> \ <xs:element name="patientId" minOccurs="0" maxOccurs="1" type="anonymous.2"/> <xs:element name="status" minOccurs="0" maxOccurs="1" type="anonymous"/> <xs:element name="instanceId" minOccurs="0" maxOccurs="1" type="anonymous.2"/> <xs:element name="typeId" minOccurs="0" maxOccurs="1" type="anonymous.3"/> </xs:sequence> <xs:attribute name="class" type="statement.type" default="statement"/> <xs:attribute name="type" type="ecid.type" default="b1ceaebb-dd15-4317-3f99-67ef3af81778"/></xs:complexType>
HL7 Source Instance MSH|^~\&|OADD|153|DADD|XNEPHA|20110208000109||ORU^R01|20110207000036|T|2.2|||| EVN|R01|201102080000| PID||1234567|274382554|007261|WHYLING^KAYLIE^O'TEST||19460413|F||W|||(801)224-1528|(866)772-3150||||21443041|535194412| PV1||O|XNEPHA^XNEPHA^^IM||||28826^Allyson^Josephine^ O'TEST |^||||||||||OP||||||||||||||||||||||||||201102070000|||||||| ORC|RE||F506556|||||||||28826^Allyson^Josephine^ O'TEST ||||^| OBR||^|F506556^|HCT^HEMATOCRIT|R||201102071554|||70011^ROSEN,AUBRY^ O'TEST |||20110207161200|^|28826^Allyson^Josephine^ O'TEST ||||M2415648||||C|F|RFP^RFP|^^^^^R|^~^~^||||||| OBX|1|NM|HCT^HEMATOCRIT|1.1|48|%|||R||F|||201102080000|IM^Performed at Inte|58528^ANDERSON^MARK|
LabObsQuantitative - XML Instance Snippet <labObsQuantitative type="b1ceaebb-dd15-4317-3f99-67ef3af81778"> <key> <code> <value>20570-8</value> </code> <codeSystem> <value>LOINC</value> </codeSystem> <originalText>HCT</originalText> </key> <primaryPQValue> <operator> <value>equals</value> </operator> <unit> <value>%</value> </unit> <value>48</value> </primaryPQValue> <referenceRangeNar type="6f422ce6-7bc6-2cc2-8c96-58c137b5c9fc"> … </referenceRangeNar> <abnormalInterpretation type="9a3c3c60-18f7-5a91-c10c-c15532a96303"> … </abnormalInterpretation> </labObsQuantitative>
Different groups use models differently NLP versus EMR Structuring the models to meet more than one use Options for different granularities of models Hematocrit model, model of pneumonia Quantitative lab result model, x-ray finding Terminology integration – use of standards and terminology services Models for “rare” kinds of data Medication being taken by a friend, not recommended by the physician Issues
Data Normalization Dr. Christopher Chute
IHC-Medication, Mayo, IHC LAB to CEM IHC RXNORM resource SharpDb HL7 Initializer Drug CEM CAS Consumer IHC-GCN TO-RXNORM Annotator HL7 (Meds) HL7 Initializer LAB CEM CAS Consumer Generic-LAB- Annotator Mirth HL7 (Labs) Mayo LOINC resource IHC LOINC resource
UIMA Normalization Pipeline • Convert HL7 V2.x Lab / Med Order Messages into CEM XML instances • Load SofA with HL7 message • Create Segment Objects in CAS • Normalize Segments in CAS • Transform Segments into CEM instances
Mayo, IHC LAB to CEM HL7 Pipe Delimited SharpDb One of the new pipelines created to normalize HL7 2.x Lab Messages into CEM instances. Mirth We pre-processed the HL7 messages converting from HL7 pipe syntax into HL7 XML format. Generic-LAB- Annotators HL7 Initializer LAB CEM CAS Consumer Generic-LAB- Annotators Mirth HL7 (XML) Mayo LOINC resource IHC LOINC resource
Mayo, IHC LAB to CEM UIMA Pipeline Flow PID 10109 45373-3 HL7 message CAS (SOFA=HL7-XML) CAS CEM Initialize Parse Normalize Transform PV1 OBX
Normalization AnatomyLab Annotators Date-Time To ISO Format HL7 Segment Parser Syntactic Integrity LOINC lookups IHC codes to LOINC table Mayo codes to LOINC table LexGrid/CTS2 Terminology Services
Architectural Opportunities HL7 2.x HL7 2.x HL7 2.x Mayo CEM format CEM format Mirth CEM format CEM format Time, Syntax Etc. CAS To XML CEM format Mirth Semantic CDA CDA CDA CDA
Tactical Next Step Enhancements • Single CEM for multiple OBX segments • Efficiently utilize terminology services • Incorporate a library for HL7 clean-up routines • Increase scope of vocabulary standardization • Enhancements for the Drug Annotator • Context enhancement issue • Drug name surprises
Additional Vocabularies • Review sources used for normalization opportunities E.g. • In HL7 OBR Segments • Standardize Service ID (Codes) • In HL7 OBX Segments • Standardize Units • Standardize Reference Ranges • Standardize Normal Flags
Drug Name Disambiguity Real patient data, presented a unique case in drug names. “ToDAY” is brand name for: cephapirin sodium. This presents an interesting named entity disambiguation use case.
Where Persistence Fits In… Mayo EDT System 6a 5 1 7 IHC IHC SHARP Mirth Mirth UIMA (Backend CDR NwHIN NwHIN Connect Connect Pipeline 2 4 Systems) 6 Aurion Aurion 8 Gateway Gateway 3 9 10 CEM Instance Database
Persistence Channels • One Channel per model • Data stored as an XML Instance of the model • Fields extracted from XML to use as indices • XML Schema defined for each model • Stored using database transactions
General Channel Design CEM XML Instance Input Message Directory Persistence Store Channel Connector Connector Processed Message Directory Error Message Directory
Patient Demographics • Each message contains patient demographics • Demographics created on first received message based on site patient ID • Internal Patient ID is created and cross mapped to site patient ID • SharpDB is keyed off internally generated Patient ID
Running in a Cloud… • Various images were installed: • NwHIN Gateway provided by Aurion • MIRTH Connect our interface engine • UIMA Pipelines of various sorts • MySQL database for persistence • JBOSS / Drools rules engine All open source, running in a Ubuntu Cloud!
SHARP Hardware Infrastructure Node Server 1 Node Server 2 Cloud Server Admin Client Interface Persistence Storage VPN/ LAN Node Controller Node Controller Cloud Controller Walrus Controller VM VM VM VM VM VM Cluster Controller Storage Controller Image Storage Build/Backup Server To Manage Cloud Node Server 3 Node Server 11 User … VPN/ LAN Node Controller Node Controller Private Switch VM VM VM VM VM VM To Connect To Instances
Data Normalization Summary • Initial “tracer shot” at Data Normalization • Cloud based processing using open source tools • Proof on concept, UIMA for Data Normalization • Move on to new problems / solutions… • Opportunities exist: • Add new annotators (modules) to the pipelines • Widen usage and scope of vocabulary services • Switch to real live flows and add HOSS clean up routines. • Various tweaks in NLP algorithms