1 / 45

The Clinical Data Repository at the University of Virginia School of Medicine

The Clinical Data Repository at the University of Virginia School of Medicine. Jason Lyman, MD, MS Associate Professor of Clinical Informatics Medical Director, UVa Clinical Data Repository September 2009. Outline. Introduction to the Clinical Data Repository

clayland
Download Presentation

The Clinical Data Repository at the University of Virginia School of Medicine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Clinical Data Repository at theUniversity of Virginia School of Medicine Jason Lyman, MD, MS Associate Professor of Clinical Informatics Medical Director, UVa Clinical Data Repository September 2009

  2. Outline • Introduction to the Clinical Data Repository • The Role of Secondary Data for Clinical Research • Advantages • Limitations • Issues to Consider When Using the CDR for Research • Demonstration of the CDR

  3. The UVa Clinical Data Repository • 1. An academic data warehouse at UVa consisting of: • An underlying database that integratesinformation from multiple UVaHS information systems • a custom-built World Wide Web interfacethat allows youto directly access retrospective de-identified UVa patient data from a population perspective, by creating queries based on diagnoses, procedures, laboratory test results, etc. • A dedicated team available to: • provide 1-1 or group training with using the CDR • work with you to provide data directly in a consultative capacity • provide identifiable patient data for authorized uses with IRB approval

  4. History • Development began in 1996 • System initially called the “Clinical Research Database” • Intent was to develop a web-based system to allow researchers to explore and download de-identified patient data sets to support research • Began with administrative data / lab data, and new data has been added over time

  5. What’s in the CDR? Death Certificate Data Pathology Demographics Diagnoses (ICD9) CDR Heart Center Procedures (CPT, ICD9) Over 900,000 patients Over 12 million encounters Financial Data Inpatient and Outpatient 1992 - present Inpatient Medications Billing Data Laboratory Results Microbiology Utilization Data

  6. Getting the Data Out Web Interface at http://cdr.virginia.edu --display or download de-identified data to your PC CDR Consulting the CDR Team, Direct Provision of Data

  7. Security and Confidentiality • We assign (and display) our own disguised patient and provider identifiers • Authorization required for access to the CDR web site • All uses are tracked and audited • 2nd level authorization required to obtain real identifiers • Allowed for IRB-approved research • Reviews preparatory to research • Quality assessment

  8. CDR Account Access • Pre-Approved • Medical Directors • Physicians • Registered Nurses • Service Center Administrators (SCA) • With the authorization of the departmental chair or service center director. • Data, Outcomes and Systems Managers and Staff • SCA Support Staff • With authorization of a faculty course director. • Students *CDR Access Request Forms are available on-line

  9. CDR Home Page (after log-in)

  10. Project Menu Create a new project, or choose from a prior one…..

  11. Intended Use How will you use the information?

  12. Population Menu You can have several populations in your project, defined by a variety of conditions. Populations can be combined, just like in a MEDLINE query.

  13. Setting Search Conditions You can specify a variety of conditions to define your population

  14. This query will find all encounters in 2004 during which a patient 21 years or older underwent a kidney transplant at UVa

  15. We found 67 cases and 67 patients

  16. A variety of reports are available about each population

  17. Drill-down to individual visits

  18. The CDR is One Local Source of Secondary Data for Clinical Research • Secondary data: data that already exists, having been collected for another purpose • Research • Patient care • Quality assessment • Administrative • Public health

  19. Example Large Datasets • National Health Interview Survey • Medical Expenditures Panel Survey • Behavioral Risk Factor Surveillance System • National Health and Nutrition Examination Survey • National Health Care Surveys • Medicare claims data • Nursing Home Minimum Dataset • Healthcare Cost and Utilization Project • Department of Veterans Affairs Databases • National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER)

  20. Example Local Databases • Medical Center Clinical Information Systems • Carecast, MIS • Cancer Registry • UMA Diabetes Registry • Trauma Registry • Wound Infections Database • Biorepository (Tissue Procurement Facility) • UVa Clinical Data Repository • …

  21. Time… Primary data collection is often very time-consuming and laborious …and money Secondary datasets are often available for free or are inexpensive Advantages of Using Existing Data

  22. Over data quality Over data collection But…you have no control Will the data be sufficiently complete and in usable format? Will the data be sufficiently accurate and reliable?

  23. And as a result, studies using existing data are particularly subject to… • Possibility of bias • Selection / sampling bias • Misclassification bias • … • Possibility of confounding (in correlational studies) • Are there unmeasured factors / variables that are determining the outcome of interest instead of / in addition to the factor(s) under study?

  24. Issues / Limitations in Using Secondary Datasets for Research • Data Availability • Is the necessary information in the datasets under consideration? • Data Accuracy • Do the data mean what we think they mean? • Data Format • Is the data in a usable (query-able, analyzable) format?

  25. Data Availability • Are the patients of interest in the dataset (in sufficient number)? • Can you reliably identify the patients of interest (that meet inclusion / exclusion criteria) • Is the data you need for analysis available? • Demographics • Clinical data of interest • Other outcomes • Potential confounders • Affects inclusion / exclusion criteria • If your dataset only contains information on VA residents, then that will clearly be one of your inclusion criteria! • What is the currency of the data? (How up-to-date is it?)

  26. Data Accuracy • Each particular data element can be evaluated as a “test” with the gold standard for whether or not the relevant condition / event is present or has occurred • E.g. attempting to identify patients with Type II DM using a specific ICD9-CM code • What is the sensitivity of that code for Type II DM? • What is the positive predictive value? • How about if we used a test result, e.g. HbA1c values? Or medications? • Applies to all data, and difficult to really know the performance • Changes over time • Changes in different clinical environments • Can rely sometimes on published data about the accuracy of coded data (e.g. ICD9-CM)

  27. Accuracy of CDR Data • Varies according to source • Lab data comes directly from the laboratory’s information system • Diagnosis / procedure data comes from administrative sources (captured for billing purposes) using ICD9-CM, CPT codes • Risk factors, some chronic conditions undercoded • Medication data comes from billing data sources (reflect administration not ordering) – generally accurate but validation studies have not been performed to assess accuracy

  28. Data Format • Data can be coded using a controlled terminology, and these codes may be difficult to use • Sometimes lack useful hierarchical organization / representation of relationships • Makes it harder to browse to find codes of interest • Makes it harder to provide useful summaries of data • Textual data • Pulling useful information out of textual data can be difficult • Synonyms • Misspellings • Negation • Context-sensitive meaning

  29. Format of CDR Data • Diagnoses coded using ICD9-CM • searching for a condition requires identifying the right code (both accurately represents the condition of interest and is used regularly at UVa) • Each encounter (inpatient or outpatient) has 1 principal / primary diagnosis but potentially many secondary diagnoses • Procedures coded using ICD9-CM (inpatient) and CPT (outpatient) • Microbiology / pathology data stored as text • Finding a “positive” blood culture or a patient with stage 4 breast cancer is often challenging because it means querying against text • Concepts don’t always match the structure.. E.g. doing a query to identify patients that have received immunizations is probably best using diagnosis not procedure codes… this is where the CDR team can help

  30. But these caveats aside… • Secondary datasets are a common source of information and can provide very useful, meaningful results at multiple stages of the research process. • The CDR is used successfully for dozens of research projects every year…. How?

  31. Roadmap for Studying a Topic • Begin with a descriptive study (least expensive, resource-intensive) • Explore the “lay of the land”, describing the distributions of disease, health-related characteristics; typically retrospective or cross-sectional • Follow with analytic study • Begin to discover possible cause-and-effect relationships (case-control / cohort studies) • Follow with an experiment (e.g. randomized controlled trial) • Determine the impact of an intervention (typically most expensive / resource-intensive) Secondary data can be useful for all three phases!

  32. Research Uses of the CDR • Hypothesis generation • Identify a patient population and explore to 1) identify the kinds of data that are available, and 2) identify any initial unexpected / interesting findings • Preliminary data for grant or a conference abstract • How many patients meet specific search conditions and what are their demographics / clinical characteristics • Descriptive studies (one group, no comparison) • What is the prevalence of specific diagnoses, certain types of treatments and what are the associated outcomes (complications, length of stay, readmission, mortality) • Comparative studies (to identify associations) • Retrospective before-after • Case-control • Cohort with control group for comparson • Study recruitment (e.g. for an RCT) • Identifying UVa patients that meet inclusion / exclusion criteria

  33. Using the CDR for Research • Access the CDR to explore the data that’s available (we will meet with you individually or in groups for CDR training) – can be done without IRB approval • If you need patient identifiers to decide if a study is feasible, contact us – IRB approval not needed as long as information not used to conduct research, patients aren’t contacted, identifiers are taken off-grounds • If you will use CDR data for a research protocol, talk to us before submitting your protocol (you will need to specify the kinds of data you will be getting from the CDR)

  34. CDR and Identifiable Data • Research will either involve • Fully identified data (mrn, name, ssn, etc.) • IRB approval required (most likely at least ‘expedited’) • Need to contact the CDR team to obtain identifiers • Limited data set (no direct identifiers, but locations or exact dates needed) • IRB approval required (most likely ‘expedited’) • Need to contact the CDR team to obtain dates, geographic details • De-identified (no direct identifiers, detailed locations, exact dates) • IRB approval recommended (most likely ‘exempt’) • May qualify as “non-human subjects research”

  35. CDR and Statistical Analysis • Role of CDR team is data access / data provision, and limited processing (transform into “flat-file” format with one record per unit of analysis) • Types of questions • Given two different patient populations (vary by diagnosis / treatment, etc), do selected outcomes occur at different rates (e.g. inpatient mortality, 7 or 30 day readmissions? • Does length of stay vary according to specific patient factors? (continuous variable) • Provider effectiveness

  36. SPARC system • Systems and Practice Analysis for Resident Competencies (SPARC) • Database / information system that pulls data from CDR, focused on supporting housestaff training in practice-based learning and improvement • Provides provider-specific measures / reports • Raises a variety of statistical issues / questions

  37. Do my patients do better or worse in terms of breast cancer screening than other patient populations?

  38. How do I compare to my peers? Given panel sizes, patient factors, am I really different or not?

  39. What trends are “real”?

  40. Analysis of Secondary Data in Biomedical Research • CDR is a local resource allowing access to a wide variety of secondary data on a large patient population • Secondary data offers significant benefits, but present challenges in terms of data availability, accuracy, format • Data(base) characteristics affect inclusion / exclusion criteria • Analyses are often fairly simplistic (cross-sectional, descriptive), but can get more complex for correlational studies • Increasing interest in using data for assessment of quality of care, provider effectiveness -> raises additional issues about risk adjustment methods and appropriate use of data Thank You!

More Related