Jyotishman Pathak, PhD Assistant Professor of Biomedical Informatics

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping Jyotishman Pathak, PhD Assistant Professor of Biomedical Informatics June 11, 2012

Project 3: Collaborators & Acknowledgments • CDISC (Clinical Data Interchange Standards Consortium) • Rebecca Kush, Landen Bain • Centerphase Solutions • Gary Lubin, Jeff Tarlowe • Group Health Seattle • David Carrell • Harvard University/MIT • GuerganaSavova, Peter Szolovits • Intermountain Healthcare/University of Utah • Susan Welch, Herman Post, Darin Wilcox, Peter Haug • Mayo Clinic • Cory Endle, Rick Kiefer, Sahana Murthy, GopuShrestha, Dingcheng Li, Gyorgy Simon, Matt Durski, Craig Stancl, Kevin Peterson, Cui Tao, Lacey Hart, Erin Martin, Kent Bailey, Scott Tabor

Phenotyping is still a bottleneck… [Image from Wikipedia]

EHR systems: United States 2002—2011 [Millwood et al. 2012]

Electronic health records (EHRs) driven phenotyping • EHRs are becoming more and more prevalent within the U.S. healthcare system • Meaningful Use is one of the major drivers • Overarching goal • To develop high-throughputautomated techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings

http://gwas.org

EHR-driven Phenotyping Algorithms - I • Typical components • Billing and diagnoses codes • Procedure codes • Labs • Medications • Phenotype-specific co-variates (e.g., Demographics, Vitals, Smoking Status, CASI scores) • Pathology • Imaging? • Organized into inclusion and exclusion criteria

EHR-driven Phenotyping Algorithms - II Rules Evaluation Phenotype Algorithm Visualization Data Transform Transform Mappings NLP, SQL [eMERGE Network]

Example: Hypothyroidism Algorithm No thyroid-altering medications (e.g., Phenytoin, Lithium) 2+ non-acute visits in 3 yrs ICD-9s forHypothyroidism AbnormalTSH/FT4 Antibodies forTTG or TPO(anti-thyroglobulin,anti-thyroperidase) No ICD-9s forHypothyroidism NoAbnormalTSH/FT4 Thyroid replace. meds No thyroid replace. meds NoAntiboides for TTG/TPO No secondary causes (e.g., pregnancy, ablation) No hx of myasthenia gravis Case 1 Case 2 Control [Denny et al., 2012]

Hypothyroidism Algorithm: Validation [Denny et al., 2012]

[eMERGE Network]

Genotype-Phenotype Association Results published observed gene / disease marker region rs2200733 Chr. 4q25 Atrial fibrillation rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 Crohn's disease rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 Multiple sclerosis rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 Rheumatoid arthritis rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B Type 2 diabetes rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 0.5 1.0 5.0 2.0 Odds Ratio [Ritchie et al.2010]

Key lessons learned from eMERGE • Algorithm design and transportability • Non-trivial; requires significant expert involvement • Highly iterative process • Time-consuming manual chart reviews • Representation of “phenotype logic” for transportability is critical • Standardized data access and representation • Importance of unified vocabularies, data elements, and value sets • Questionable reliability of ICD & CPT codes (e.g., billing the wrong code since it is easier to find) • Natural Language Processing (NLP) is critical

Algorithm Development Process - Modified Rules Semi-Automatic Execution Evaluation Phenotype Algorithm Visualization Data Transform Transform Mappings NLP, SQL [eMERGE Network]

Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria • Use the NQF Quality Data Model (QDM) Rules • Conversion of structured phenotype criteria into executable queries • Use JBoss® Drools (DRLs) Semi-Automatic Execution Evaluation Phenotype Algorithm Visualization • Standardized representation of clinical data • Create new and re-use existing clinical element models (CEMs) Data Transform Transform [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] Mappings NLP, SQL

The SHARPn “phenotyping funnel” Intermountain EHR Mayo Clinic EHR [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012]

Clinical Element ModelsHigher-Order Structured Representations [Stan Huff, IHC]

Pre- and Post-Coordination [Stan Huff, IHC]

CEMs available for patient demographics, medications, lab measurements, procedures etc. [Stan Huff, IHC]

SHARPn data normalization flow - I CEM MySQL database with normalized patient information [Welch et al. 2012]

SHARPn data normalization flow - II CEM MySQL database with normalized patient information

Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria • Use the NQF Quality Data Model (QDM) Rules Semi-Automatic Execution Evaluation Phenotype Algorithm Visualization • Standardized representation of clinical data • Create new and re-use existing clinical element models (CEMs) Data Transform Transform [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] Mappings NLP, SQL

Our task: human readable  machine computable [Thompson et al., submitted 2012]

NQF Quality Data Model (QDM) • Standard of the National Quality Forum (NQF) • A structure and grammar to represent quality measures in a standardized format • Groups of codes in a code set (ICD-9, etc.) • "Diagnosis, Active: steroid induced diabetes" using "steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)” • Supports temporality & sequences • AND: "Procedure, Performed: eye exam" > 1 year(s) starts before or during "Measurement end date" • Implemented as set of XML schemas • Links to standardized terminologies (ICD-9, ICD-10, SNOMED-CT, CPT-4, LOINC, RxNorm etc.)

116 Meaningful Use Phase I Quality Measures

Example: Diabetes & Lipid Mgmt. - I Human readable HTML

Example: Diabetes & Lipid Mgmt. - II Computable XML

NQF Measure Authoring Tool (MAT)

Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria • Use the NQF Quality Data Model (QDM) Rules • Conversion of structured phenotype criteria into executable queries • Use JBoss® Drools (DRLs) Semi-Automatic Execution Evaluation Phenotype Algorithm Visualization • Standardized representation of clinical data • Create new and re-use existing clinical element models (CEMs) Data Transform Transform [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] Mappings NLP, SQL

JBoss® open-source Drools rules based management system (RBMS) • Represents knowledge with declarative production rules • Origins in artificial intelligence expert systems • Simple when <pattern> then <action>rules specified in text files • Separation of data and logic into separate components • Forward chaining inference model (Rete algorithm) • Domain specific languages (DSL)

Example Drools rule {Rule Name} rule"Glucose <= 40, Insulin On“ when $msg : GlucoseMsg(glucoseFinding <= 40, currentInsulinDrip > 0 ) then glucoseProtocolResult.setInstruction(GlucoseInstructions.GLUCOSE _LESS_THAN_40_INSULIN_ON_MSG); end {Class Getter Method} {Java Class} {binding} {Class Setter Method} Parameter {Java Class}

Automatic translation from NQF QDM criteria to Drools Measure Authoring Toolkit Drools Engine From non-executable to executable Measures XML-based Structured representation Drools scripts Converting measures to Drools scripts Data Types XML-based structured representation Fact Models Mapping data types and value sets Value Sets saved in XLS files [Li et al., submitted 2012]

Automatic translation from NQF QDM criteria to Drools [Li et al., submitted 2012]

The “executable” Drools flow

Phenotype library and workbench - I http://phenotypeportal.org Converts QDM to Drools Rule execution by querying the CEM database Generate summary reports

Phenotype library and workbench - II http://phenotypeportal.org

Phenotype library and workbench - III http://phenotypeportal.org

Phenotype library and workbench - IV

Additional on-going research efforts - I • Machine learning and association rule mining • Manual creation of algorithms take time • Let computers do the “hard work” • Validate against expert developed ones [Caroll et al. 2011]

Additional on-going research efforts - I • Origins from sales data • Items (columns): co-morbid conditions • Transactions (rows): patients • Itemsets: sets of co-morbid conditions • Goal: find allitemsets (sets of conditions) that frequently co-occur in patients. • One of those conditions should be DM. • Support: # of transactions the itemsetI appeared in • Support({TB, DLM, ND})=3 • Frequent: an itemsetI is frequent, if support(I)>minsup X: infrequent [Simon et al. 2012]

Additional on-going research efforts - II

Additional on-going research efforts - II TRALI/TACO sniffer

Active Surveillance for TRALI and TACO Of the 88 TRALI cases correctly identified by the CART algorithm, only 11 (12.5%) of these were reported to the blood bank by the clinical service. Of the 45 TACO cases correctly identified by the CART algorithm, only 5 (11.1%) were reported to the blood bank by the clinical service.

Additional on-going research efforts - III • Phenome-wide association scan (PheWAS) • Do a “reverse GWAS” using EHR data • Facilitate hypothesis generation [Pathak et al. submitted 2012]

Publications till date (conservative)

Mayo projects and collaborations • Ongoing • Transfusion related acute lung injury (Kor) • Drug induced liver injury (Talwalkar) • Drug induced thrombocytopenia and neutropenia (Al-Kali) • Active surveillance for celiac disease (Murray) • Warfarin dose response & heartvalvereplacements (Pereira) • Phenotype definition standardization (HCPR/Quality) • Getting started/planning • Pharmacogenomics of systolic heart failure (Bielinski/Pereira) • Pharmacogenomics of SSRI (Mrazek/Weinshilboum) • Lumbar image reporting with epidemiology (Kallmes) • Active clinical trial alerting (CTMS/Cancer Center)

HTP related presentations • June 11th, 2012 • Using EHRs for clinical research (VitalyHerasevich) • Association rule mining and T2D risk prediction (Gyorgy Simon) • Scenario-based requirements engineering for developing EHR add-ons to support CER in patient care settings (JunfengGao) • June 12th, 2012 • Exploring patient data in context clinical research studies: Research Data Explorer (Adam Wilcox et al.) • Utilizing previous result sets as criteria for new queries with FURTHeR (Dustin Schultz et al.) • Semantic search engine for clinical trials (Yugyung Lee) • Knowledge-driven workbench for predictive modeling (Peter Haug et al.) • Clinical analytics driven care coordination for 30-day readmission – Demonstration from 360 Fresh.com (Ramesh Sairamesh)

Jyotishman Pathak, PhD Assistant Professor of Biomedical Informatics