590 likes | 713 Views
After the First Steps: The Evolution of a Longitudinal Survey. National Population Health Survey (NPHS) Douglas Yeo Workshop on Longitudinal Research in Social Science—A Canadian Focus Population Studies Centre, University of Western Ontario London, Oct. 25–27, 1999. NPHS Program.
E N D
After the First Steps: The Evolution of a Longitudinal Survey National Population Health Survey (NPHS) Douglas Yeo Workshop on Longitudinal Research in Social Science—A Canadian Focus Population Studies Centre, University of Western Ontario London, Oct. 25–27, 1999
Objectives • To aid in the development of public policy • To understand the determinants of health • Economic, social, demographic, occupational and environmental correlates of health • To explore relationship between health status and health care utilisation • To follow a panel of people to reflect the dynamic process of health • To provide means to supplement content or sample • To allow linkage with administrative data
Sample • Sample allocation at the national, provincial and territorial levels • Minimum requirement of 1,200 households for each province and territory • Household component: 20,000 households • Use of the LFS sampling design • Health care institutions: 2,500 residents • The North: 2,400 persons
Description • Longitudinal and cross-sectional • First cycle in 1994, repeated every 2 years • Personal and telephone interviews • Basic information collected from all household members • One household member selected as the health respondent (longitudinal respondent)
Description • General Questionnaire • All household members • Proxy reporting permitted (55% of cases) • Health Questionnaire • One randomly selected respondent in each household • Proxy reporting rarely permitted (4% of cases)
Content—Core (General) • Two-week Disability • Health Care Utilization • Restriction of Activities • Chronic Conditions • Sociodemographic Characteristics • Country of birth, immigration, language • Labour force • Income • Education
Content—Core (Health) • Self-Perceived Health • Blood Pressure • Women’s Health • Height and Weight • Health Status • Physical Activity • Repetitive Strain (1996 and 1998) • Injuries
Content—Core (Health) • Use of Medications • Smoking • Alcohol • Mental Health • Social Support • Sense of Coherence (1994 and 1998) • Alcohol Dependence (1996)
Content—Focus 1994 • Stress • Ongoing problems • Recent Life Events • Childhood and Adult Stressors (“traumas”) • Work Stress • Self-esteem • Mastery
Content—Focus 1996 • Access to Services • Blood pressure • Pap smear test • Mammography • Breast examinations • Breastfeeding • Physical check-up • Flu shots • Dental visits • Eye examination • Emergency services • Insurance coverage
HPS—1996 • Height and Weight • Breast Self-Examination • Breastfeeding • Pregnancy • HIV • Smoking • Alcohol • Sexual Health • Road Safety • Food Insecurity • Separate Realise
Content—Focus 1998 • Focus • Self Care • Family Medical History • Diet/Nutrition • Tobacco Alternatives • Food Insecurity supplement (HRDC)
Content—Focus 2000 • Additional chronic conditions • In-depth diabetes questions • Fibromyalgia • Tanning and UV exposure • Stress questions are back • Ongoing Problems • Recent Life Events • Childhood and Adult Stressors (“traumas”) • Work stress • Self-esteem • Mastery • Illicit drug use
File Creation—1994 • Core sample (20,000) • Buy-in sample • N.B., Ont., Man., B.C. • Files produced: Cross-sectional • 1994 General File (all household members) • 1994 Health File (one randomly selected respondent)
File Creation—1996 • 1994 responding panel members • Cross-sectional Files • 1996 General File (all household members) • 1996 Health File (one randomly selected respondent) • Includes buy-in sample • Ontario, Manitoba, Alberta • Longitudinal File (1994–96)
Products—Files • Master Files 1994–95 & 1996–97 (released) • Share Files 1994–95 & 1996–97 (released) • (Health Canada & Provinces) • Public Use Microdata Files • 1994–95 Household, General & Health (rel.) • 1994–95 Institutions, Health (rel.) • 1996–97 Household, General & Health (rel.) • 1996–97 Institutions, Health & Longitudinal (late 1999) • 1996–97 Household, Longitudinal (doubtful)
Products—Access • Master Files • Selected Regional Offices • Deemed employee of Statistics Canada • Remote Access • Internet job submission • Using test master files • Free to clients • DLI, SSHRC
Products—Publications • NPHS Overview Report • 1994–95—self-rated health and income, chronic conditions and pain, depression, use of health care services and alternative medicine • 1996–97—chronic disease incidence, changes in activity limitation status, depression, repetitive strain injuries, smoking, use of health care services • 1998–99—March 2000 issue of Health Reports
Products—Publications • Health Reports—detailed articles: • Depression, chronic pain, immigrants’ health, sense of coherence, smoking, hormone replacement therapy, bicycle helmet use, sample design…
Products—NHRDP • National Health Research and Development Program • Jointly funded by Health Canada and Statistics Canada • Up to $300,000 annually for NPHS research • Cycle 1: 8 grants, papers available • Cycle 2: 7 grants, papers available • Cycle 3: 7 grants, research starting • Cycle 4: Health Canada preparing RFP
1994 Sample Design • Household target population • Based upon Labour Force Survey (LFS) and Enquête sociale et de santé (in Quebec only) • Household residents in all provinces • Exclusions: Indian reserves, Canadian forces bases, remote areas in Ontario and Quebec • Stratified multistage design
1994 Sample Design • 1st stage • Strata formed • Major urban centres, urban towns, rural areas • Further stratified by geography and/or socioeconomic characteristics • Clusters (heterogeneous) formed independently within strata • Clusters selected based upon PPS sampling • 2nd stage • Dwelling lists prepared for each selected cluster • Subsample of households selected within each cluster
Cluster Sampling • Highly cost-effective in terms of listing and data collection • Only selected clusters are listed • Less efficient than SRS • Neighbouring units similar (intracluster correlation) • PPS sampling • Vary the probability with which a unit is selected according to its size • Units do not have same probability of selection (unequal weights)
1994 Sample DesignRejective Method • One member/hhld selected at random to be longitudinal respondent • Panel would underrepresentpersons in large hhlds (parents and children) and overrepresent persons in smaller hhlds (singles and elderly) • Portion of sample pre-identified for screening • If no member < 25 years old then screened out • Increased # hhlds visited by anticipated # screened out
1994 Sample DesignIntegration With NLSC • NLSC follows ~ 25,000 children • NPHS longitudinal respondents < 12 years of age collected by NLSC • NPHS children’s sample used in NLSC estimates and for NPHS • Due to scheduling constraints NPHS kids sample not selected before Q3 and Q4
Sample Design: Subsequent Cycles • Longitudinal respondents recontacted, using contact information from previous cycles • Moved into an institution • Moved to territories • Moved to an Indian reserve => tried to get data • Moved temporarily away • Identified deaths • Hhlds in sample include hhlds where the longitudinal respondent currently lives • Hhld composition may have changed
Sample Design: Subsequent Cycles • Longitudinal respondents’ data used for panel and cross-sectional purposes • Hhld members data used for cross-sectional estimates only (General file) • NPHS kids sample now collected by NPHS, not NLSCY • Cross-sectional supplementary samples from previous cycles not followed up
Sample Design: Subsequent Cycles • Top-up of sample every second cycle • First time in 1998 • For cross-sectional purposes only • Account for changing population, panel attrition • To cover population not present in 1994: new births, immigrants
Data Collection • Statistics Canada LFS interviewers • Computer-Assisted Personal or Telephone Interviews (CAPI/CATI) • Built-in edits, mins, maxes • Direct skip patterns • On-screen prompts • Pre-filling of text or data • Average interview time 1 hour
Data Collection • Data collected at 4 points in time • For operational, seasonality reasons • June, August, November, February • Nonresponse: no contact, refusal • Letter sent, second call, senior interviewer follows up • Never replace sample dwellings with others • Resends: follow up nonresponse in subsequent quarters, and in special resend period the following June
Data Collection • Tracing to find longitudinal respondents • Panel member only • Feed back information from previous cycles • Data quality check • Probes for reasons for change • Restriction of activities, chronic conditions, smoking • Some sociodemographic information not re-asked if no change
Processing • Editing • On-line edits in CAPI • Some head office consistency edits • Invalid, inconsistent data set to "not stated" • Coding of write-in information (e.g., drugs) • Creation of derived variables
Response Rates • 1994 Household: 88.7% • Selected respondent: 96.1% • 1996 Longitudinal • General: 93.6% • Health: 92.8% • Only 1.7% not traced • 1996 Cross-sectional Household: 82.5% • Selected Respondent: 95.0%
Analysing Complex Data • Point estimation • Survey weights must be used in calculation of estimates to correctly draw conclusions about pop’n of interest • Weights take stratification, unequal sampling probabilites into account • Variance estimation • Using survey weight only not sufficient • Complex design (and design effect) must be accounted for to avoid serious underestimation of standard errors
Effect of Weighting • Comparison of males and females who reported being in excellent or very good health • Weighted difference 65.3 - 61.6 = 3.7% • Unweighted difference 62.6 - 60.8 = 1.8%
1998 Weighting Methodology • All panel respondents have a longitudinal weight • Includes moved to institution, dead, etc. • Start with basic weights from 1994 • Derived from LFS or L’enquête sociale et de santé weights • Probability of selecting a dwelling in a selected cluster
1998 Weighting Methodology • Nonresponse adjustment—by weighting classes • To account for potential nonresponse bias. • Study if nonrespondents are different, • Create special weighting classes based on response propensity using CHAID to account for these differences properly • Calibrate to 1994 population totals (by province/age/sex)
1998 Weighting Methodology • Three longitudinal weights • WT68LF: “Full”— for fully completed for all components/all occasions • WT68LP: “Partial”— for fully completed for 1994 and 1998 • WT64LS: “Square”—entire panel of 17,276, including nonrespondents
Design Effects • Measure of complexity of sample design • Calculate design variance using bootstrap weights • Calculate SRS variance • Deff = design variance / SRS variance • Generally, deffs > 1 for clustered designs, deffs < 1 for stratified designs • Varies (sometimes greatly) by characteristic
Variance Estimation • Measuring sampling error for complex sample designs • Simple formulas not available • Most software packages do not incorporate design effect appropriately for variance calculations • Need to provide some measures of data quality to users
NPHS Variance Estimation • Bootstrap resampling method (similar to jackknife) used for all variance estimation • Aggregates, proportions, differences, coefficients from linear and logistic regressions • Variance estimation program written in SAS/SPSS macros • Approximate coefficient of variation (CV) look-up tables also provided with PUMF • For categorical variables, totals, proportions
Bootstrap Weight Method • Variance estimation divided into two phases: • Calculation of bootstrap weights • Calculated only once, by Statistics Canada • Variance estimation using bootstrap weights • Internally and externally • Bootstrap weights available for regional office master files, for share files, in remote access program (dummy files) • No need for design information • Bootstrap weights incorporate design effect implicitly
Bootstrap Weights:Calculation • Resampling method, which divides records into subgroups (replicates) and determines the variation in the estimates from replicate to replicate • Within each stratum, resample within original sample by taking a SRSWR of n-1 of the n clusters in that stratum
Bootstrap Weights:Calculation • Recalculate the weight for each record in that stratum—this is the bootstrap weight • We now have a new bootstrap weight for every record on the file. This set of weights is the first bootstrap replicate. A new point estimate ( ) can be calculated using the weights of this replicate • Repeat B (e.g., B=500) times
Bootstrap Weights:Variance Estimation • To estimate the variance of any estimate (), first calculate the estimate B times, using the weights from the B bootstrap replicates • Then calculate the variance among these B estimates:
Bootstrap Weight Method:Advantages • Sets of 500 bootstrap weights can be distributed to analysts • Handles large datasets • Interprovincial migration accounted for corrrectly in variance estimates • Recommended (over the jackknife) for estimating the variance of nonsmooth functions like quantiles, LICO, Gini index
Variance Estimation Example • Comparison of % of males vs. females who are in excellent or very good health • Weighted difference 65.3 - 61.6 = 3.7% • SAS—scaled weights • Standard error: 0.36 • 95% confidence interval (3.0, 4.4) • Bootstrap • Standard error: 0.70 • 95 % confidence interval (2.3, 5.1)
Limitations and Feedback • Some topics could be explored more thoroughly • Data raises more questions than it answers • Sample sizes can become small in a hurry • Often useful to combine with other survey data to explain phenomenon • Nice to be able to calculate bootstrap variance which takes design into account
Analytical Findings • Proxy / nonproxy reporting • Handling item nonresponse • Handling data inconsistencies • Study gross flows / changes