260 likes | 866 Views
Potential Data Uses. Sample Size Estimates (aggregate data without IRB approval)Feasibility, grant applications, statistical planningIdentifying patients for enrollment/recruitmentBy diagnosis, pathology, stage, labs, medsIdentifying/creating matched study controlsObtaining current demographics (name, address) for mail solicitationFrom research list or by clinic, provider, clinical criteriaObtaining ongoing clinical administrative data on a registry panelLabs, visits, procedures, immu1146
E N D
1. Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga
David Thiemann, Center for Clinical Data Analysis
1
2. 2 Preliminary anonymous data - How many patients with specific lab results, ICD9s, radiology studies, etc., are available for my study? What are the demographics?
Preliminary anonymous data - How many patients with specific lab results, ICD9s, radiology studies, etc., are available for my study? What are the demographics?
3. Possible research data sources
EPR (JHH & JHBMC)
Sunrise Clinical Manager (JHH – inpatient)
Meditech (Bayview)
Casemix Datamart
GE Centricity (JHCP)
EPR2020
Departmental Systems (ED, OR, Anesthesia)
Clinical Research Management System (CRMS)
IDX (professional fees)
Death Registry
3
4. Methods for Data Access
Historical: Researcher Negotiates Access With Clinical System /DBA
Logistic nightmare, technical challenge
Clinical Research Management System (CRMS)
Study cohort with real-time links to enterprise data
Center for Clinical Data Analysis
Monthly/quarterly data extracts from designated systems
4
5. Clinical Research Management System (CRMS) 5 1,054 Users
1079 Active Studies
25,430 Participants
Data Available in CRMS
eIRB
EPR (patient demographics)
Study participants / accruals
Electronic Case Report Forms - in next 2-3 months As of July 1, 2009, must be used on all JH SOM studies which could result in a patient bill
Highlight overall workflow supported by CRMS
- Nightly data pull from eIRB
The Protocol Library houses study-related documents for the conduct of the trial. It includes:
Interfaces to other systems: eIRB, EPR, Protocol LibraryAs of July 1, 2009, must be used on all JH SOM studies which could result in a patient bill
Highlight overall workflow supported by CRMS
- Nightly data pull from eIRB
The Protocol Library houses study-related documents for the conduct of the trial. It includes:
Interfaces to other systems: eIRB, EPR, Protocol Library
6. Clinical Research Management System (CRMS) 6 Ways to extract data
Canned Reports (click for examples)
Ad-hoc querying using SQL
Possible with CCDA support - automated study-specific data extracts
7. EPR2020 Data for Researchers 7 4.2M Patients, 23.4M Visits
12.3M Documents, 6.8M Radiology Reports
25.6M Lab Results
1.5M Problems, 2.2M Medications, 140K Allergies
Planned
Bayview & JHCP data
ICD9 diagnosis codes and CPT charges (IDX)
Future
Death Registry
Blood Product Data for Transfusions
Eclipsys SCM Order data
HMED (ED), ORMIS, eADR/Medivision Data Discovery Tool
Research application – tailored for investigator’s needs
“My participants” and their data
Alert me about key events
Search through de-identified data to answer questions
Data Discovery Tool
Research application – tailored for investigator’s needs
“My participants” and their data
Alert me about key events
Search through de-identified data to answer questions
8. My Participant’s Lab Data 8 Today research staff laboriously look up patients in EPR one by one, find and transcribe the lab results.
Time consuming. Error Prone
With EPR 2020
Researcher immediately sees “their” participants
Grid with lab results they can filter & sort
Exportable to Excel
Status
Positive feedback from 5 study teams
Improving speed
Anticipate widespread availability by End of Summer 2010Today research staff laboriously look up patients in EPR one by one, find and transcribe the lab results.
Time consuming. Error Prone
With EPR 2020
Researcher immediately sees “their” participants
Grid with lab results they can filter & sort
Exportable to Excel
Status
Positive feedback from 5 study teams
Improving speed
Anticipate widespread availability by End of Summer 2010
9. Registry Cohort Discovery using EPR2020
A JHM investigator wants to find and enroll diabetic patients
aged 45-65 years
with hemoglobin A1C between 7 and 9%
serum creatinine < 2 mg/dl 9 NEAR FUTURE!
Launch a de-identified researcher view
Get list of patients
Use Amalga Filter features to narrow in on those patients who meet eligibility criteria
Once researcher has IRB approval, give view with PHI.
FOCUS ON 3738 IN THE FIRST SLIDE
FILTER SLIDE FOCUS ON THE RIGHT SIDE
LAST SCREEN FOCUS ON NUMBER 8 SOPMETHOW WITH LAST THREE COLUMNS OF INFO
NEAR FUTURE!
Launch a de-identified researcher view
Get list of patients
Use Amalga Filter features to narrow in on those patients who meet eligibility criteria
Once researcher has IRB approval, give view with PHI.
FOCUS ON 3738 IN THE FIRST SLIDE
FILTER SLIDE FOCUS ON THE RIGHT SIDE
LAST SCREEN FOCUS ON NUMBER 8 SOPMETHOW WITH LAST THREE COLUMNS OF INFO
10. Center for Clinical Data Analysis (CCDA) Provides periodic (monthly/quarterly) bulk data extracts (delimited/flat files, .xls):
Preliminary, anonymous data for feasibility, grant applications and statistical sample-size estimates
IRB-approved case-finding--for study enrollment (mailings, phone solicitation), chart review, and cohort/case-control studies
Research data extracts - monthly/quarterly integrated extracts from EPR, POE, ORMIS, lab/PDS, billing systems, vaccination/transfusion/culture data, etc. 10
11. How CCDA works Email CCDA@jhmi.edu, cc: dthiema1@jhmi.edu; phone 410-955-65558 (Thiemann)
For IRB-approved research:
Provide full protocol + IRB approval
Meet to discuss query methods, format
Iterate, then schedule prod (email extracts, Jshare)
Cost: $100/hour
For non-IRB projects (exploratory analyses, QI)
Same process, cost subsidized by ICTR/JHM
Do NOT implicitly morph QI into IRB 11
12. The Basics: Getting Clinical Data Into a Registry Database Real work, not ad hoc/bootstrap
Need $$$ and FTE(s)
Smart analyst(s) who know database technology and understand (or can learn) nuances of the sources and content domain
Hands-on PI management/guidance
Statistical liason early, before database schema and ETL methods are set in stone 12
13. The Extract-Transform-Load process:Getting Clinical Data into Research DB Raw clinical/administrative data is useless for research
Build an intermediate (staging) database
Don’t do data management in SAS/Stata/Excel
Data dictionary—derivation for each field
Templated, tested, documented cleanup scripts/routines.
Intermediate tables: Log each step/modification
For each batch, be able to re-create data transform from scratch
Version control, change control and documentation are vital
Build data versioning into the database 13 OK, you have data—now what? In essence you’re building a mini-data-warehouse so you need some of the same processes and tools. Need to be able to re-create from scratch if process/criteria change. Most researchers use stat tools or Excel. Madness.OK, you have data—now what? In essence you’re building a mini-data-warehouse so you need some of the same processes and tools. Need to be able to re-create from scratch if process/criteria change. Most researchers use stat tools or Excel. Madness.
14. Transforming Data Raw data typically string (char/text) fields
Unanalyzable characters (* < >, comments) still have meaning
Put non-numeric data in separate field. Avoid numerical recoding (999)
~3% of pts have multiple/non-preferred MRNs
Need 1-to-many link table
Assays/reference ranges/coding changes
Avoid using raw codes (CPT/ICD) in research db
Map clinical codes to research terms
Defer analytic assumptions. When recoding data, anticipate problems. Keep options open. 14
15. More Data Transform Challenges NEVER trust raw data. Learn business logic of source system.
CPTs morph annually, internal complexity/redundancy
Lab assays/reference/terms change
Parsing is inherently unreliable
Administrative names/groups change (clinic #s, departments).
Duplicate-value problems (labs, orders)
System-attribution source/datetime (POE, lab)
Always run an aggregate (“group by” ) query to identify alternative names (eg lab name) and values (number, result) before transform. Otherwise you’ll miss something 15
16. Understanding Business Logic Trust but verify: Test coding accuracy
Providers may habitually use imprecise/inaccurate diagnosis codes (especially in profee data)
ICD9 procedure indications often a billing fiction
Trained coders may make systematic errors
Different content domains may have different standards (inpt vs outpt coders)
Don’t infer/assume dependencies unless enforced by source system.
Run min/max queries, aggregates, outer joins
Confirm date ranges, data ranges, relative proportions by year
Don’t assume that null rows actually are empty. Maybe the query missed something 16
17. JHM Clinical Data Landscape: Past, Present and Future Past : Babble of unintegrated systems
EPR (antiquated technology, VSAM files, DB2) contains text, not queryable, analyzable data
Present: EPR2020 (aka Amalga) –integrated data!!
Has everything in EPR, plus JHCP, plus gradually adding data from clinical/departmental/administative systems (IDX CPTs, transfusion medicine, ORMIS, HMED, eADR, death registry, ad infinitum)
Future: ? Epic, ? JHM Data Warehouse
Epic: One system replacing all major JHM systems
JHH timeline: 4+ years 17 Research queries for “EPR” data actually hit Amalga. Amalga has 3 main roles: Clinical/legacy repository, research repository, staging database for Epic +/- warehouse.Research queries for “EPR” data actually hit Amalga. Amalga has 3 main roles: Clinical/legacy repository, research repository, staging database for Epic +/- warehouse.
18. JHM Data Sources: Casemix Datamart Gold standard for JHM (non-profee) administrative data, including payer/insurance data
Combines data from Keane (hospital charges), ADT (admission/discharge/transfer), HDM (ICD9 diagnosis + procedure coding), HSCRC (regulatory submissions)
Not a true data warehouse; meager reconciliation
Best source for length of stay, resource use, ICD9 diagnoses
Outpatient ICD9s limited
Has JHH + BMC + HCGH data 18
19. JHM Data Sources: IDX (profee) Gold standard for inpatient +outpatient CPT (profee charge) data
ICD9 diagnosis data problematic
Limitation: No data from non-faculty providers (private physicians, etc.)
Difficult to query. Has a data warehouse, limited access.
Early target for EPR2020/Amalga integration. 19
20. JHH Data Sources: SCM/POE Sunrise Clinical Manager/Provider Order Entry
Replicated transactional database, difficult to query
For registry purposes POE has large attribution/process challenges: Stutter-step orders, multiple alerts, imputed times
Great source for inpatient meds, labs, physiologic monitor data
No codified ICD9/Snomed/RxNorm data
No outpatient data
20
21. JHH Data Sources: SCC/AIM Sunrise Critical Care (aka Emtek, Eclipsys). JHH ICUs + stepdown units + oncology
AIM analytic database contains selected but comprehensive batch extract
Sunsets as ICUs switch to POE ClinDoc
Challenging to query. Lots of denormalized fields
21
22. JHH + BMC Data Sources: PDS PDS=Pathology Data Systems
Includes lab, transfusion medicine, anatomic pathology, cytopath, John Boitnott’s death registry
Lab data also available via EPR2020/Amalga and POE
22
23. BMC Data Sources: Meditech Shrink-wrapped, comprehensive inpatient + outpatient clinical + financial system
Difficult for ad hoc research queries.
Exports data to Datamart and EPR2020
BMC-JHH patient linkage doable but difficult, needs caution 23
24. JHCP Data Sources: GE Centricity All clinical + administrative data for JHCP clinics
Largely opaque to research query; JHCP sometimes collaborates directly, especially for its physician/investigators
Early target for EPR2020/Amalga integration
Linkage challenges to BMC and JHH mrns
24
25. JHH Departmental Data:ORMIS + eADR/Medivision ORMIS: Operating Room Management Information System
Mostly transactional scheduling/tracking/administrative data, limited clinical data.
Has diagnoses, procedures, case start/stop times
eADR/Medivision (anesthesia) still evolving, limited research data access
Design challenges similar to legacy SCC critical-care system. 25
26. JHH Departmental Data: HMED (Emergency Department) Mostly opaque to research
Replicated data hosted by Datamart 26