260 likes | 271 Views
Using EMR Data for Population Registries. Diana Gumas , JHMCIS Senior Director for Research Systems, EPR and EPR2020/ Amalga David Thiemann, Center for Clinical Data Analysis. Potential Data Uses. Sample Size Estimates (aggregate data without IRB approval)
E N D
Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical Data Analysis
Potential Data Uses • Sample Size Estimates (aggregate data without IRB approval) • Feasibility, grant applications, statistical planning • Identifying patients for enrollment/recruitment • By diagnosis, pathology, stage, labs, meds • Identifying/creating matched study controls • Obtaining current demographics (name, address) for mail solicitation • From research list or by clinic, provider, clinical criteria • Obtaining ongoing clinical + administrative data on a registry panel • Labs, visits, procedures, immunizations, CPT/ICD9 codes, resource use
Possible research data sources • EPR (JHH & JHBMC) • Sunrise Clinical Manager (JHH – inpatient) • Meditech (Bayview) • Casemix Datamart • GE Centricity (JHCP) • EPR2020 • Departmental Systems (ED, OR, Anesthesia) • Clinical Research Management System (CRMS) • IDX (professional fees) • Death Registry
Methods for Data Access • Historical: Researcher Negotiates Access With Clinical System /DBA • Logistic nightmare, technical challenge • Clinical Research Management System (CRMS) • Study cohort with real-time links to enterprise data • Center for Clinical Data Analysis • Monthly/quarterly data extracts from designated systems
Clinical Research Management System (CRMS) • 1,054 Users • 1079 Active Studies • 25,430 Participants Data Available in CRMS • eIRB • EPR (patient demographics) • Study participants / accruals • Electronic Case Report Forms - in next 2-3 months
Clinical Research Management System (CRMS) Ways to extract data • Canned Reports (click for examples) • Ad-hoc querying using SQL • Possible with CCDA support - automated study-specific data extracts
EPR2020 Data for Researchers From EPR Today 4.2M Patients, 23.4M Visits 12.3M Documents, 6.8M Radiology Reports 25.6M Lab Results 1.5M Problems, 2.2M Medications, 140K Allergies Planned • Bayview & JHCP data • ICD9 diagnosis codes and CPT charges (IDX) Future • Death Registry • Blood Product Data for Transfusions • Eclipsys SCM Order data • HMED (ED), ORMIS, eADR/Medivision
My Participant’s Lab Data Reliable. Driven by the CRMS Participant Registry. Exportable.
Registry Cohort Discovery using EPR2020 A JHM investigator wants to find and enroll diabetic patients aged 45-65 years with hemoglobin A1C between 7 and 9% serum creatinine < 2 mg/dl
Center for Clinical Data Analysis (CCDA) Provides periodic (monthly/quarterly) bulk data extracts (delimited/flat files, .xls): • Preliminary, anonymous data for feasibility, grant applications and statistical sample-size estimates • IRB-approved case-finding--for study enrollment (mailings, phone solicitation), chart review, and cohort/case-control studies • Research data extracts - monthly/quarterly integrated extracts from EPR, POE, ORMIS, lab/PDS, billing systems, vaccination/transfusion/culture data, etc.
How CCDA works • Email CCDA@jhmi.edu, cc: dthiema1@jhmi.edu; phone 410-955-65558 (Thiemann) • For IRB-approved research: • Provide full protocol + IRB approval • Meet to discuss query methods, format • Iterate, then schedule prod (email extracts, Jshare) • Cost: $100/hour • For non-IRB projects (exploratory analyses, QI) • Same process, cost subsidized by ICTR/JHM • Do NOT implicitly morph QI into IRB
The Basics: Getting Clinical Data Into a Registry Database • Real work, not ad hoc/bootstrap • Need $$$ and FTE(s) • Smart analyst(s) who know database technology and understand (or can learn) nuances of the sources and content domain • Hands-on PI management/guidance • Statistical liason early, before database schema and ETL methods are set in stone
The Extract-Transform-Load process:Getting Clinical Data into Research DB • Raw clinical/administrative data is useless for research • Build an intermediate (staging) database • Don’t do data management in SAS/Stata/Excel • Data dictionary—derivation for each field • Templated, tested, documented cleanup scripts/routines. • Intermediate tables: Log each step/modification • For each batch, be able to re-create data transform from scratch • Version control, change control and documentation are vital • Build data versioning into the database
Transforming Data • Raw data typically string (char/text) fields • Unanalyzable characters (* < >, comments) still have meaning • Put non-numeric data in separate field. Avoid numerical recoding (999) • ~3% of pts have multiple/non-preferred MRNs • Need 1-to-many link table • Assays/reference ranges/coding changes • Avoid using raw codes (CPT/ICD) in research db • Map clinical codes to research terms • Defer analytic assumptions. When recoding data, anticipate problems. Keep options open.
More Data Transform Challenges • NEVER trust raw data. Learn business logic of source system. • CPTs morph annually, internal complexity/redundancy • Lab assays/reference/terms change • Parsing is inherently unreliable • Administrative names/groups change (clinic #s, departments). • Duplicate-value problems (labs, orders) • System-attribution source/datetime (POE, lab) • Always run an aggregate (“group by” ) query to identify alternative names (eg lab name) and values (number, result) before transform. Otherwise you’ll miss something
Understanding Business Logic • Trust but verify: Test coding accuracy • Providers may habitually use imprecise/inaccurate diagnosis codes (especially in profee data) • ICD9 procedure indications often a billing fiction • Trained coders may make systematic errors • Different content domains may have different standards (inpt vs outpt coders) • Don’t infer/assume dependencies unless enforced by source system. • Run min/max queries, aggregates, outer joins • Confirm date ranges, data ranges, relative proportions by year • Don’t assume that null rows actually are empty. Maybe the query missed something
JHM Clinical Data Landscape: Past, Present and Future Past : Babble of unintegrated systems • EPR (antiquated technology, VSAM files, DB2) contains text, not queryable, analyzable data Present: EPR2020 (aka Amalga) –integrated data!! • Has everything in EPR, plus JHCP, plus gradually adding data from clinical/departmental/administative systems (IDX CPTs, transfusion medicine, ORMIS, HMED, eADR, death registry, ad infinitum) Future: ? Epic, ? JHM Data Warehouse • Epic: One system replacing all major JHM systems • JHH timeline: 4+ years
JHM Data Sources: Casemix Datamart • Gold standard for JHM (non-profee) administrative data, including payer/insurance data • Combines data from Keane (hospital charges), ADT (admission/discharge/transfer), HDM (ICD9 diagnosis + procedure coding), HSCRC (regulatory submissions) • Not a true data warehouse; meager reconciliation • Best source for length of stay, resource use, ICD9 diagnoses • Outpatient ICD9s limited • Has JHH + BMC + HCGH data
JHM Data Sources: IDX (profee) • Gold standard for inpatient +outpatient CPT (profee charge) data • ICD9 diagnosis data problematic • Limitation: No data from non-faculty providers (private physicians, etc.) • Difficult to query. Has a data warehouse, limited access. • Early target for EPR2020/Amalga integration.
JHH Data Sources: SCM/POE • Sunrise Clinical Manager/Provider Order Entry • Replicated transactional database, difficult to query • For registry purposes POE has large attribution/process challenges: Stutter-step orders, multiple alerts, imputed times • Great source for inpatient meds, labs, physiologic monitor data • No codified ICD9/Snomed/RxNorm data • No outpatient data
JHH Data Sources: SCC/AIM • Sunrise Critical Care (aka Emtek, Eclipsys). JHH ICUs + stepdown units + oncology • AIM analytic database contains selected but comprehensive batch extract • Sunsets as ICUs switch to POE ClinDoc • Challenging to query. Lots of denormalized fields
JHH + BMC Data Sources: PDS • PDS=Pathology Data Systems • Includes lab, transfusion medicine, anatomic pathology, cytopath, John Boitnott’s death registry • Lab data also available via EPR2020/Amalga and POE
BMC Data Sources: Meditech • Shrink-wrapped, comprehensive inpatient + outpatient clinical + financial system • Difficult for ad hoc research queries. • Exports data to Datamart and EPR2020 • BMC-JHH patient linkage doable but difficult, needs caution
JHCP Data Sources: GE Centricity • All clinical + administrative data for JHCP clinics • Largely opaque to research query; JHCP sometimes collaborates directly, especially for its physician/investigators • Early target for EPR2020/Amalga integration • Linkage challenges to BMC and JHH mrns
JHH Departmental Data:ORMIS + eADR/Medivision • ORMIS: Operating Room Management Information System • Mostly transactional scheduling/tracking/administrative data, limited clinical data. • Has diagnoses, procedures, case start/stop times • eADR/Medivision (anesthesia) still evolving, limited research data access • Design challenges similar to legacy SCC critical-care system.
JHH Departmental Data: HMED (Emergency Department) • Mostly opaque to research • Replicated data hosted by Datamart