Using Electronic Medical Records for Research: Practical Issues and Implementation Hurdles

Using Electronic Medical Records for Research: Practical Issues and Implementation Hurdles Prakash M. Nadkarni MD

Benefits of EMRs • Most of the data that you want is often in the EMR • Sample Size Analyses • Cohort identification /recruitment • Detail Data • You can implement many research related workflows • Appointment scheduling enables interventions at the patient's convenience.

EMRs don't do everything • Even Epic warns you about the need to interoperate with software designed specifically for clinical research (CRIS=Clinical Research Information System). • Even CRISs are sub-specialized: Project management/finance, grant management workflows, federal paperwork (FDA Investigational New Drug applications), general or specialized data capture (e.g., patient diaries, adaptive questionnaires).

Challenge: No Study Calendar • All patients are not enrolled at the same time. • Specific evaluations or interventions are done at specific time points ('events") relative to start of participation in the study (or some arbitrary point- e.g., working backwards from a scheduled MRI scan). • Each time point may have a permissible range or window (e.g., “6-mth follow up” may occur between 5-7 months). • Given a protocol/study calendar, a CRIS will *generate* a provisional patient calendar.

Study Calendar (2) • The protocol is worked out based on information yield of the evaluation and expected rate of change in the parameters evaluated, evaluation cost and patient risk. An Event-CRF Cross-Table enforces consistency. • CRISs use "Unscheduled" events to deal with emergency conditions. • An entire set of reports are calendar-driven – e.g., scheduled events, missing forms, out-of-range visits. • In Epic, the closest to Calendar functionality is the Chemotherapy module (Beacon)

Non-adherence to Standards • If vendor ignores national/international controlled terminology standards, data pooling in cross-institutional collaborations is difficult • For procedures, Epic does not use Clinical & Procedural Terminology (CPT). Instead, procedures are identified by idiosyncratic abbreviations created by hurried users, that are hard to interpret except by those users, and vary across institutions.

Standards Challenges (2) • Of the 15,000 laboratory tests in our instance of Epic, only about 8% have been mapped currently to the Logical Observations, Identifiers, Nomenclature and Codes (LOINC) vocabulary. • Sometimes the same procedure or lab test is defined more than once in a master table • the definitions are unhelpful, and one must look at the actual data to determine which are used, e.g., histogram showing number of tests performed over a period of time, the max and minimum values.

Redundancy and heterogeneity • The data may have been stored more than once, and in different ways, in different parts of the medical record • BMI is recorded in two different places. • "Uncontrolled" local terminologies • Flowsheets where Blood pressure is recorded redundantly as text "124/82". (Not in UIHC, fortunately.) • Procedures and Lab definitions list are also semi-controlled.

Duplicate Elements • Pseudo-redundancy: Subtly different data elements that are given the same label in the user interface • Baby's birth weight is recorded both at the time of delivery and at the time of admission to a NICU. The two are not semantically the same: with interventions, the former may be significantly more (or less) than the latter.

“Wrong” structure • Much data (discharge summaries, etc.) is stored as text, requiring human abstraction or Natural language processing (NLP). • NLP is not 100% accurate, requiring sensitivity and specificity to be traded off. It is especially hard with progress notes that are replete with abbreviations and that may have little grammatical structure. • Much of the published NLP work relies on idiosyncrasies of a particular dataset (e.g., the use of Epic templates) to achieve higher accuracy, and is not always generalizable.

The Needle in the Haystack • Epic schema contains several thousand tables; many unused, or with empty fields. • Incomplete or out-of-date documentation. • The first time, one may spend more time locating a particular data element than actually pulling it out. • Persons doing data extraction need to add value by providing signposts and tips, to help others who have to do the same task later. • Even with a data warehouse, this problem will reoccur as long as data definitions are suboptimal

Real-time cohort identification must be done judiciously • "Best Practice Alerts" can be a resource drain on responsiveness of systems. • Do you really need real-time subject identification? Would a 24-hour delay be acceptable? ICU-related clinical studies; transfusion in preemies.

Transforming the Data • The form in which data is recorded in the EMR is not necessarily the form in which it is most conveniently analyzed or reported. • Registries often require creating derived variables • Converting numerical data into categories – e.g., Binning children by birth weight • Converting numeric values or existence/absence of data into Yes/No: Is the bilirubin > 5 mg/dl? Did the neonate receive nitric oxide inhalation for pulmonary hypertension?

Interfacing with statistical software • Before: sample size, randomization • After: Analysis, fitting to models • Some CRISs (e.g., REDCap, TrialDB) will output SAS/SPSS-formatted data files, with definitions for all variables (including enumerations for all categorical variables; SAS has a command called PROC FORMAT for categorical data). EMRs still lag.

Data Warehouse • A database that is optimized for fast query, preferably by end-users, without interactive updates • Solves some problems, but not others • More homogeneous structure – i.e., a handful of tables rather than thousands. • However, the problem of locating variables of interest doesn't go away. With indifferent documentation of the variables, the problem of hunting for variables of interest is transferred from the concierge/analyst to the end-user, which may worsen the problem.

Special Challenges in EMR Data Interpretation /Reliability • Data entry errors in source data, often a consequence of “copy and paste”. • Coding of categorical variables does not accommodate nuances in the medical history or diagnostic findings. • Depending on the source, billing data may have been up-coded (Humana). • Outcome data may be lacking – absence of return visit data maysimply mean that patient failed to improve and went elsewhere.

Special Challenges (2) • Data fragmentation – especially where healthcare is provided by separate institutions. • Data is observational – treatments and exposures are not assigned randomly. • Confounding Bias – socioeconomic factors might lead patients to use suboptimal treatments • Selection/sampling Bias – atypical demographical attributes for the cohort whose data you are seeing, may limit inferences that you can make about the general population.

Frontiers: Genetic Data • There are no technical barriers to the incorporation of limited genetic data for an individual– e.g., SNPs or specific mutations – in structured (i.e., readily analyzable) form. • Major current issue is the limited understanding of genetic data and definitions by EMR vendors. • Whole-genome is still a long-way off. A single record would be larger than the bulk of existing non-image EMR data.

Conclusions • None of the challenges are insurmountable, but they take a lot of effort and resources to address • Most of the fixes are long-term, involving: • Manual mapping to controlled vocabulary terms • Change in processes • Maintaining descriptive documentation that must continually be checked for usability and currency.

Further Reading • Masys DR, et al . Technical desiderata for the integration of genomic data into Electronic Health Records.J Biomed Inform. 2012 Jun;45(3):419-22 • Nadkarni, Ohno-Machado and Chapman. Natural Language Processing: A Tutorial. Journal of the American Medical Informatics Association, 2011. PMC3168328 • Hoffman & Podgurski, “Big, bad data” Journal of Law, Medicine and Ethics, (2013) 41:1,pp 56-60. http://www.ncvhs.hhs.gov/130430b6.pdf.

Questions?

Using Electronic Medical Records for Research: Practical Issues and Implementation Hurdles