1 / 31

Making Large Data Sets Work for You Advantages and Challenges

Making Large Data Sets Work for You Advantages and Challenges. Lesley H Curtis Soko Setoguchi Bradley G Hammill. Presenter disclosure information. Lesley H Curtis Large Data Sets: An Overview FINANCIAL DISCLOSURE: None UNLABELED/UNAPPROVED USES DISCLOSURE: None.

dgarnett
Download Presentation

Making Large Data Sets Work for You Advantages and Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Large Data Sets Work for YouAdvantages and Challenges Lesley H Curtis Soko Setoguchi Bradley G Hammill

  2. Presenter disclosure information Lesley H Curtis Large Data Sets: An Overview FINANCIAL DISCLOSURE: None UNLABELED/UNAPPROVED USES DISCLOSURE: None

  3. Large Data Sets: An Overview Prescription Drug Data: Advantages, Availability, and Access Linking Large Data Sets: Why, How, and What Not to Do Practical Examples Agenda

  4. Relevant for cardiovascular research Available to researchers Potential for linkage Claims data—federal and commercial Inpatient registries Longitudinal cohort studies Which large data sets?

  5. Claims data Derived from payment of bills Payor-centric Examples Medicare Medicaid Thomson-Reuters United Health Care

  6. Medicare claims data Inpatient services (Part A) Outpatient services (Part B) Physician services (Carrier, Part B) Durable medical equipment Home health care Skilled nursing facilities Hospice

  7. Medicare claims data elements What data are available Demographics Service dates Diagnoses Procedures Hospital / Physician What data are not available Physiological measures Test results Times of admission, procedures, etc. Medications

  8. Medicare claims data coverage National scope What patients will be represented? Patients enrolled in traditional (fee-for-service) Medicare What patients will not be represented? Patients receiving care through the Veterans Health Administration Patients enrolled in Medicare managed care plans

  9. Medicare claims data quality Main point Reliability of specific claims data elements depends on importance for reimbursement Good data on… Major procedures Hospitalizations Mortality Inconsistent data on… Comorbidities and illness severity Procedures with low reimbursement rates

  10. Acquiring CMS claims data • All requests begin with ResDAC (www.resdac.umn.edu) • Cost • $15K per year of inpatient+denominator data • $20K per year of 5% data across all files • $30K+ per year of data for custom requests • Detailed approval process • Prepare request packet for ResDAC review (4-6 weeks) • Review by CMS privacy board (4 weeks) • Request processed by contractor (6-8 weeks)

  11. Preparing for CMS claims data • Make space • 16 GB for 100% denominator and inpatient files • 57 GB for 5% denominator, inpatient, outpatient, and carrier* files • Manage expectations • Time to process files • Transforming raw claims into usable information • Coding algorithms • Coding changes • Learning curve

  12. The Learning Curve

  13. Claims data Derived from payment of bills Payor-centric Examples Medicare Medicaid Thomson-Reuters United Health Care

  14. Commercial claims data elements What data are typically available Demographics Service dates Diagnoses Procedures Medications Hospital / Physician What data may not be available Physiological measures Test results

  15. Commercial claims data coverage National scope What patients will be represented? Individuals who are commercially insured What patients will not be represented? The uninsured Medicare managed care?

  16. Commercial claims data quality Similar to Medicare claims data Reliability of specific claims data elements depends on importance for reimbursement Good data on… Major procedures Hospitalizations Inconsistent data on… Mortality Comorbidities and illness severity Procedures with low reimbursement rates

  17. Preparing for commercial claims data • Cost • $25-70K depending on size, scope of data request • Size • 100 GB per year of data • Analysis sample sizes will differ from advertised sample sizes • Manage expectations!

  18. Registry data Observational cohorts of patients undergoing specific treatments or having specific conditions Purpose may be to assess… Quality of care Provider performance Treatment safety/effectiveness Of interest today are hospital-based registries

  19. OPTIMIZE-HF registry Hospital-based quality improvement program and internet-based registry for heart failure. 2002-2005: 50,000 patients; > 250 hospitals Transitioned to GWTG-HF in 2005

  20. Registry data coverage Only patients treated at participating hospitals will be included All patients at these hospitals included regardless of payor Participating hospitals may not be representative of hospitals nationwide

  21. Registry data quality Good data on… Many of the things not included in Medicare data: Labs, medications, treatment timing, process measures, contraindications (if collected) Inconsistent data on… Post-hospitalization follow-up care Outcomes, particularly long-term

  22. Accessing registry data • Networking and partnering • Many require that analyses be performed at selected analytical centers which may have long queues • Approval process via steering or executive committee

  23. NHLBI longitudinal cohort studies • Atherosclerosis Risk in Communities Study (ARIC) • Cardiovascular Health Study (CHS) • Framingham Heart Study • Jackson Heart Study • Multi-Ethnic Study of Atherosclerosis • Women’s Health Initiative

  24. Cardiovascular Health Study (CHS) • Prospective, observational study of CV disease in the elderly (Washington Co. Maryland, Forsyth Co. NC, Sacramento Co. CA, and Pittsburgh, PA.) • Baseline exams occurred from 1989-90. • Minority cohort added at Year 5 • Annual exams, with ‘major’ exams occurring at year 5 (1992-93), and year 9 (1996-97). Last exam was year 11 (1998-99). • 5,201 participants at baseline; 687 additional minority participants  5,888

  25. Cardiovascular Health Study data elements What data are available Demographics Medical, personal history Physiological measures, test results QOL, depression Cognitive function What data are not available Service dates Procedures Hospital/physician

  26. Cardiovascular Health Study data quality Main point Data collected are of high quality Good data regarding… Cardiovascular risk factors Cardiovascular endpoints General health Limited data on… Non-cardiovascular risk factors Non-cardiovascular endpoints

  27. Accessing NHLBI cohort studies • Via the NHLBI data repository • HIPAA identifiers, geography removed • Via Coordinating Center for identifiable data • Size • 20MB per year of data

  28. NHLBI-Medicare linked data sets • CMS linked with… • CHS (1991-2004, 2005-2009 pending) • Framingham (2000-2009 pending) • Jackson Heart Study (2000-2009 pending) • Multi-Ethnic Study of Atherosclerosis (2000-2009 pending) • Atherosclerosis Risk in Communities • Women’s Health Initiative

  29. Conclusion • Large data sets abound • Do yourself a favor…manage expectations!

  30. Contact Information Lesley Curtis Lesley.curtis@duke.edu

More Related