120 likes | 276 Views
Loyola University Chicago Health Sciences Division Stritch School of Medicine (SSOM) The Clinical Research Database (CRDB). January 8, 2014. Speaker :. Ron Price Associate Dean, Office of Information Systems Loyola University Chicago Stritch School of Medicine Maywood, Illinois &
E N D
Loyola University Chicago Health Sciences Division Stritch School of Medicine (SSOM)The Clinical Research Database (CRDB) January 8, 2014
Speaker: Ron Price Associate Dean, Office of Information Systems Loyola University Chicago Stritch School of Medicine Maywood, Illinois & Associate Vice President, Informatics and Systems Development Loyola University Chicago Health Science Division
What is the Clinical Research Database (CRDB)? • Large-scale, de-identified clinical data warehouse structured to support a wide range of clinical analytics • Operates on advanced Hadoop technology • CRDB data are accessible via a web-based front-end for casual users (e.g., faculty, housestaff and students) and via a wide range of tools for advanced users (e.g., analysts, bioinformatics staff, etc.) • Initial target data loads for the CRDB are from Epic (1/1/2007-9/30/2013)
Developed by Yahoo in mid-2000’s and is extensively utilized by “big-data” internet companies (and the NSA) to process large amounts (petabytes) of structured and unstructured data. Hadoop is a data management/processing framework that distributes data storage and processing over clusters of inexpensive computers Hadoop’s strengths are its ability to scale and to efficiently handle unstructured data (e.g., text reports, images, BLOBs, etc.) SSOM’s Hadoop environment Development and Production environments Production environment 12-node cluster (2 namenodes, MySQL srv, and 9 datanodes) 178TBs of storage (current core Epic EMR is 4TBs) Why use Hadoop?
Hadoop’s strengths are its ability to scale and to efficiently handle unstructured data (e.g., text reports, images, BLOBs, etc.) “Of the 1.2 billion clinical documents produced in the United States each year, approximately 60 percent contain valuable information trapped in unstructured documents that are unavailable for clinical use, quality measurement and data mining.”* Some estimates put this number closer to 80% * Health Management Technology – June 2012 Why use Hadoop?
Epic is LUMC’s EMR however most data originates and are stored their native (e.g., granular or structured) formats in local ancillary systems (e.g., Clinical Labs, RIS/PACS, EPS, etc.) Epic is optimized for healthcare operations and not for research or population studies Activity related to large-scale analytics impacts system performance The “10,000 table” issue (actually 11,964! tables) Systems supporting research and population studies need Flexibility to handle “foreign” (e.g., external, multi-center) data Flexibility to handle unstructured data Need ability to scale to “big data” levels Why not just use Epic?
Current data De-identified with keys held in Epic Clarity data warehouse Data source of Epic Clarity (updated nightly) Data period of 1/1/2007 through 09/30/2013 Updated quarterly (next update mid-March 2014) Data tables Demographics Encounters (Inpatient, Outpatient, ED, Obs and home health) Procedures and clinical lab values Flowsheet measures (vitals, physical findings, etc.) Medications Payor information at encounter level CRDB application is widely available on the portal CRDB Version 1.0 (July 2013)
Website development activities Request for expedited IRB Refinement of “groupings” for ICD9s, CPTs and providers Capture of additional data (Current calendar year) Microbiology results and other report text blobs End-user Query Tool – Additional query parameters and analysis modules Enhanced logic functions (January 2014) CPTs (March 2014) Labs (June 2014) Flowsheet measures (August 2014) Units (October 2014) CRDB Version 1.0 - Future
Unique CRDB Users – 213 Query Tool CRDB Cohort identifications – 302 CRDB Data Extracts (since August) 5 large clinical extracts for a recent PCORI grant Large data extract for Chicago Health Atlas project 2 QI projects 6 Medical Student/Resident clinical research projects Current Usage (July 2013 – Dec 2013)