450 likes | 574 Views
Finding and Using Publicly Available Datasets for Secondary Data Analysis Research. KL2 Seminar February 2011. Disclosures and acknowledgements. Disclosures: None Acknowledgements: Alex Smith, Michael McWilliams, Ann Nattinger, SGIM Research Committee. Two shout-outs.
E N D
Finding and Using Publicly Available Datasets for Secondary Data Analysis Research KL2 Seminar February 2011
Disclosures and acknowledgements Disclosures: None Acknowledgements: Alex Smith, Michael McWilliams, Ann Nattinger, SGIM Research Committee
Two shout-outs • Comparative Effectiveness Research through CTSI Smith AK et al, JGIM 2011
Learning objectives • Appreciate key conceptual and practical issues involved in secondary data analysis • Identify and use online tools for locating and learning about publicly available datasets relevant to your research • Focus on what is useful to you
(My) Definition of Secondary Data Data that have been collected but not for you
Types of Secondary Data • Survey (NHIS, NHANES, HRS, BRFSS) • Administrative (Medicare claims) • Discharge (HCUP SID and NIS) • Medical chart / EMR • Disease registries (SEER) • Aggregate (ARF, US Census) • Research databases (SOF) • Combinations and linkages
Key Conceptual Issues • Someone else’s secondary data is your primary data • Treat data and research plan with same rigor as would for a primary data collection study • Research questions should be conceptually driven, interesting a priori • Some exceptions – Warren Browner rule • Know data as well as if you had collected it yourself • Who is in the cohort? • Strengths and limitations of data collection procedures, instruments
Selecting a Database • Compatibility with research question(s) • Availability and expense • Sample: representativeness, power • Measures of interest present and valid • Messiness and missingness • Local expertise • Linkages
Resources Needed • Your effort • Computer resources and security • Programmer and/or statistician effort • PhD statistical support – complex sampling or analyses • Coordinator if merging datasets • Realistic timeline / Gantt chart
Cases • Amita is a junior faculty member interested in doing a secondary data analysis project on association between race/ethnicity and the prevalence and outcomes of atrial fibrillation. No prior experience and limited direct mentorship. • Eric is a junior faculty member with past experience. Wants to find new dataset around which write grant on association between SES and ADL function in elders.
Amita –Getting Started • Amita • Get acquainted with basics • Find dataset and assess merit and feasibility • Find a mentor / get expert help • www.sgim.org/go/datasets
Getting Expert Help • Request a consultation • 1 on 1 consultation • Clear, defined questions about dataset • “strengths and weaknesses about using XYZ to study patterns of medication use for heart failure”
Eric – Getting Down to Business • Identify datasets relevant to his research interests • Identify health statistics, validated instruments, funding sources • www.sgim.org/go/datasets
Finding Additional Resources • National Information Center on Health Services Research and Health Care Technology (NICHSR) • Inter-University Consortium for Political and Social Research (ICPSR) • Partners in Information Access for the Public Health Workforce • Roadmap K-12 Data Resource Center (UCSF) • List of datasets from the American Sociologic Association • Canadian Research Data Centers – Data Sets and Research Tools (Canada) • Directory of Health and Human Services Data Resources • Publicly Available Databases from National Institute on Aging (NIA) • Publicly Available Databases from National Heart, Lung, & Blood Institute (NHLBI) • National Center for Health Statistics (NCHS) Data Warehouse • Medicare Research Data Assistance Center (RESDAC); and Centers for Medicare and Medicaid Services (CMS) Research, Statistics, Data & Systems • Veterans Affairs (VA) data
CELDAC • Comparative Effectiveness Large Dataset Analysis Core • UCSF CTSI • Access to local and national datasets and expertise http://ctsi.ucsf.edu/research/celdac
National Information Center on Health Services Research and Health Care Technology (NICHSR) • Databases, data repositories, health statistics • Fellowship and funding opportunities • Glossaries, research and clinical guidelines • Evidence-based practice and health technology assessment • Specialized PubMed searches on healthcare quality and costs http://www.nlm.nih.gov/hsrinfo/index.html
ISPOR • International Society for Pharmacoepidemiology and Outcomes Research http://www.ispor.org/DigestOfIntDB/CountryList.aspx
Inter-University Consortium for Political and Social Research (ICPSR) • World’s largest archive of social science data • Searchable • Many sub-archives relevant to HSR • Health and Medical Care Archive • National Archive of Computerized Data on Aging http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/index.jsp
Questions? • Specific high-value datasets • Causal inference / comparative effectiveness • Which comes first – RQ or dataset? • Evaluating and managing validity of measures • Analyzing complex survey data
EXTRA SLIDES • Additional brief information about specific high-value datasets • VA administrative data • NHANES • NAMCS • NIS
Administrative Data (VA) • VA has multiple high-value administrative databases • Outpatient visit information • Visit date, type of clinic, provider, ICD9 diagnoses • Inpatient information • Admitting dx(s), discharge dx(s), CPT codes, bed section, meds administered • Lab data • >40 labs • Pharmacy data • All inpatient and outpatient fills • Academic affiliation • etc
Administrative Data (VA) • Huge bureaucracy and paperwork
Administrative Data (VA) • Messy data • Huge size • 2 TB server • Data analyst
Survey Data (NHANES) • National Health and Nutrition Examination Survey (NHANES) • Nationally representative sample of >10K patients every 2 years • Extensive interview data on clinical history (including diseases, behaviors, psychosocial parameters, etc.) • Physical exam information (e.g. VS) • Labs, biomarkers
Survey Data (NHANES) • Free and easy to download • (Relatively) easy to use • Although requires careful reading of documentation • Serial cross-sectional • Disease data self-report • Very limited information about providers and systems of care
Survey Data (NAMCS) • National Ambulatory Medical Care Survey (NAMCS) and National Hospital Ambulatory Medical Care Survey (NHAMCS) • Nationally representative sample of ~70K outpatient and ED visits per year • Physician-completed form about office visit
Survey Data (NAMCS) • Data more from physician perspective (diagnoses, treatments Rx’ed, etc) and some info on providers (e.g., clinic organization, use of EMRs, etc) • Serial cross-sectional • Visit-focused • Not comprehensive, ? value for chronic diseases
Discharge Data (NIS) • National Inpatient Sample (NIS) • Database of inpatient hospital stays collected from ~20% of US community hospitals by AHRQ • Diagnoses and procedures, severity adjustment elements, payment source, hospital organizational characteristics • Hospital and county identifiers that allow linkage to the American Hospital Association Annual Survey and Area Resource File
Discharge Data (NIS) • Relatively easy to access (DUA, $200/yr) • Relatively easy to use • Though need close attention to documentation • Limited data elements • Huge data files