190 likes | 311 Views
Longitudinal Analysis of Health Care Data in Large Populations: Data Configurations and Methods Daniel Gilden JEN Associates, Inc. Cambridge, Massachusetts USA. Data Applications: What is the process we are supporting?. The life cycle of applied research: Research
E N D
Longitudinal Analysis of Health Care Data in Large Populations: Data Configurations and Methods Daniel Gilden JEN Associates, Inc. Cambridge, Massachusetts USA
Data Applications: What is the process we are supporting? • The life cycle of applied research: • Research • Policy development/strategic planning • Active program management • Evaluation of impact of policies/therapies • Research...
What Do We Measure? • People • personal and family demographics • geographic and personal environment • economic status • diseases • disabilities • utilization of medical services/therapies • healthcare provider affiliations
Where is the Data? • Birth and death records • Public health disease registries • Personal survey data • Economic information by local area • Administrative data • cost reports • aggregate purchasing information • therapy level payment records • enrollment data
How do we Understand What is in the Data? • Turning data into information • the poorer the data source the more statistical manipulation will be required • sample extrapolation • insufficiently specified models • the denser the source the easier the analytic problem but… • costly data preparation • complex analytic file production
Matching the Data to the Research or the Research to the Data? • Can the data support the unit of analysis? • program • population • person • Are the exposure and outcome measures available?
Everybody Has Time But Can We Handle It? • Interacting time with the unit of analysis multiplies both the data processing challenge and the analytic opportunities • Cross-sectional • standardized time bucket for simple comparative analyses • Longitudinal • following trajectories over time, why? • to understand the past and predict the future
Standardized Snapshot Measure Using Claims, Cost Reports or Pharmacy Purchasing Records
Expenditure Trend from Treatment Claims 50% Increase in Monthly Expenditures Over 3 Years
Population Trajectories in Costs and Utilization Rates Non-Schizophrenia Dx: Number of Users and Costs Per User Driving Cost Increase
Data System Configurations • Each example implies a different data infrastructure, from the least to most complex…but all the profiles are reasonable starting points for research • Match the question to the data…do not let your eyes become bigger than your resources • The important point is to start and not wait for better systems or more detailed data... systems grow organically
Steps to Developing a Research Data Infrastructure • Data Inventory • what is currently available • Access model • how many users and how deep the yield • Analytic method selection • analysis type determines resources • Hardware and software follow the methods • Design for economy - planned profligacy • The human element - where’s the talent?
Data Structures • Vertical: Data Archives • fixed length • a single record per observation • research area related data fields • Horizontal: Analytic Records • aggregated to the unit of analysis • summary variables • time oriented arrays
System Design Goals • Rational data structures minimize hardware and software requirements and reduce analysis time • Reusable data structures and methods • Select a data analysis strategy and stay with it, re-use data, re-use methods, never reinvent the wheel • Design a data update and expansion strategy in advance to minimize disruptions and data damage
Sample Configuration: Typical US Source Data • Large US State, 2.5 million Medicaid Beneficiaries • Three Years, 800 million treatment records • Monthly Enrollment Denominator • Integrated and linked Pharmacy, Physician, Inpatient, Post-acute Care, Long Term and Chronic Care • Payments, diagnoses, therapies • Linked to regional economic profiles • 2.5 million person level summary records
Successful System Configuration • 1 Tera-Byte Disk capacity • 1 Giga-Byte RAM • 1 Off the shelf Dell PC, Windows XP • SAS license for large database steps and statistical analysis • Oracle license for storage of output tables • Supports three researchers - not necessarily skilled programmers • Access model is “deep but narrow”
Results: Performance • Interaction between vertical and horizontal data structures, four examples revisited • Snapshot of drugs by Psycho-active (PA) category: 2 minutes • 3 years of PA Rx payments: 3 minutes • PA prescribing in schizophrenia population identified from physician records: 12 minutes • Time relative Rx refills for populations with new schizophrenia: 15 minutes