10 likes | 141 Views
IDR Snapshot: Quantitative Assessment Methodology Evaluating Size and Comprehensiveness of an Integrated Data Repository Vojtech Huser, MD, PhD a James J. Cimino, MD a,b
E N D
IDR Snapshot: Quantitative Assessment Methodology Evaluating Size and Comprehensiveness of an Integrated Data Repository Vojtech Huser, MD, PhDa James J. Cimino, MDa,b a Laboratory for Informatics Development, National Institutes of Health Clinical Center b National Library of Medicine • Conclusion, Discussion and Future Work • We have made several modifications to the methodology and the measure set based on the input from two IDRs. Whereas version 1 used a simple custom event data model, version 2 provides scripts for the i2b2 schema and, via data transformations present at several institutions, it can be applied to numerous institutions that are direct i2b2 adopters or institutions with indirect mapping to i2b2, such as NIH HMO Collaboratory sites. We also plan to approach the CTSA Informatics Affinity Group for Data Repository and even implement the IDR Snapshot as an i2b2 plug-in. The advantage of our measure set is that it does not place any arbitrary value on individual measure components (e.g., value of 10 years of claims data vs. 10 years of detailed EHR data) and it is easy to extend the methodology to accommodate additional data characteristics based on user feedback. The methodology has several limitations based on heterogeneity of data repositories, such as storage of data elements as self standing facts or event attributes, or different data schemes for sub-events (e.g., lab order–lab results event pairs). • References • 1. Huser V, A Methodology for Quantitative Measurement of Quality and Comprehensiveness of a Research Data Repository, Proc of 16th Annual HMO Research Network Conference 2010 • 2. Cimino JJ, Ayres EA. The Clinical Research Data Repository of the US National Institutes of Health. Proceedings of Medinfo 2010:1299-1303. • Acknowledgements: • This work is supported by intramural research funds from the National Library of Medicine and the NIH Clinical Center. • Contact: • vojtech.huser@nih.gov Abstract We have developed a methodology for evaluating size and comprehensiveness of an integrated data repository (IDR). Our method uses an extensible set of measures computed with a standard database query language. It can be applied at multiple institutions to facilitate comparison of institutional repositories or tracking a single repository size and data coverage through time. Introduction: Many institutions have implemented an integrated data repository (IDR), which is defined as a data warehouse with clinical, administrative, clinical trial, and -omics data optimized for research purposes rather than clinical care. For federation of single IDRs into larger research networks (such as DartNet, STRIDE, HMORN, or OMOP), it is necessary to systematically assess the size and comprehensives of data of each contributing data partner. We have previously piloted a set of such quantitative measures [1] and in this study we present application of this assessment methodology to the IDR at the National Institutes of Health (NIH) and comparison of results with an IDR at Marshfield Clinic (MC). Methods: The IDR assessment methodology consists of computing a set of quantitative measures referred to as IDR Snapshot. Our pilot study included a set of 8 measures that were specifically designed to be intuitive to interpret and would facilitate continuous monitoring. For example, the measure D3 assessed the total number of patients with at least one diagnosis, one laboratory result and one prescription. The IDR Snapshot also includes very basic measures, such as the total number of events (G1) and the total number of patients (G2) within the repository. We used the database of the Biomedical Translational Research Information System (BTRIS)[2] as the key data source evaluated in this study. It integrates data from the NIH Clinical Center EHR system (Allscripts Sunrise Clinical Manager) and numerous other systems across several NIH institutes (e.g., National Cancer Institute’s C3D clinical trials data management system). The IDR Snapshot is an open source project (available at http://code.google.com/p/idrsnapshot) that uses Structured Query Language (ANSI SQL:2008) and can be executed on all major database platforms. Results: We have successfully applied existing IDR Snapshot measures to the NIH repository. See Figure 1 for a complete list. Comparison of the results from NIH and MC shows similar drops in data comprehensiveness when multiple patient data aspects are considered. NIH and MC had similar total event counts (G1 measure, NIH: 0.48B events; MC: 0.57B); however, the comparison of the G2 measure (NIH: 0.3M patients, MC: 1.7M) reveals some differences in data patterns. Lifetime dimension measures within the IDR Snapshot clearly showed differences in covered time periods. It is important to note that both IDRs are constantly being improved, and the measures may change significantly as more contributing data sources are being integrated. Figure 1: Example of comparison of 2 IDRs using IDR-Snapshot measures. Site B represents NIH Clinical Center (measures were computed in January 2012). Site A represents Marshfield Clinic IDR (measures were taken from a prior publication and were computed in 2010)