1 / 67

Canadian Community Health Survey Cycle 1.1 Overview of methodological issues and more...

Presentation Outline. Sample DesignTarget population, sample allocation and framesSampling strategies, oversampling of sub-populationsData collection, response ratesImputationWeightingSampling errorSampling variability guidelinesVariance estimation: Bootstrap re-sampling techniqueCV look-u

claral
Download Presentation

Canadian Community Health Survey Cycle 1.1 Overview of methodological issues and more...

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Canadian Community Health Survey Cycle 1.1 Overview of methodological issues and more...

    2. Presentation Outline Sample Design Target population, sample allocation and frames Sampling strategies, oversampling of sub-populations Data collection, response rates Imputation Weighting Sampling error Sampling variability guidelines Variance estimation: Bootstrap re-sampling technique CV look-up tables Analysis Examples How to use the Bootvar programs

    3. CCHS - Cycle 1.1 Health Region-level survey Main objective Produce timely cross-sectional estimates for 136 health regions Target population individuals living in private occupied dwellings aged 12 years old or over Exclusions: those living on Indian Reserves and Crown Lands, residents of institutions, full-time members of the Canadian Armed Forces and residents of some remote areas CCHS 1.1 covers ~98% of the Canadian population

    4. CCHS - Sample Allocation to Provinces Prov Pop # of 1st Step 2nd Step Total Size HRs 500/HR X-prop Sample NFLD 551K 6 *2,780 1,230 4,010 PEI 135K 2 1,000 1,000 2,000 NS 909K 6 3,000 2,040 5,040 NB 738K 7 3,500 1,650 5,150 QUE 7,139K 16 8,000 16,280 24,280 ONT 10,714K 37 18,500 23,760 42,260 MAN 1,114K 11 5,500 2,500 8,000 SASK 990K 11 *5,400 2,320 7,720 ALB 2,697K 17 *8,150 6,050 14,200 BC 3,725K 20 10,000 8,090 18,090 CAN 29,000K 133 65,830 64,920 130,750 * The sampling fraction in some small HRs was capped at 1 in 20 households

    5. CCHS - Sample Allocation to Health Regions Pop. Size # of Mean Range HRs Sample Size Small less than 75,000 41 525 Medium 75,000 - 240,000 60 900 Large 240,000 - 640,000 25 1,500 X-Large 640,000 and more 7 2,500

    6. CCHS - Sample Allocation to Territories Population Sample Yukon 25,000 850 NWT 36,000 900 Nunavut 22,000 800

    7. CCHS - Sample Frame CCHS sample selected from three frames: Area frame (Labour Force Survey structure) RDD frame of telephone numbers (Random Digit Dialling) List frame of telephone numbers Three frames are needed for CCHS for the following reasons: 1. To yield the desired sample sizes in all health regions 2. Have a telephone data collection structure in place to quickly address provincial/regional requests for buy-in sample and/or content at any point in time 3. Optimize collection costs

    8. Area frame - Sampling of households 83% of CCHS sampled households Multistage stratified cluster sample design

    9. RDD frame of telephone numbers Sampling of households Elimination of non-working banks method 7% of CCHS sampled households Telephone bank: area code + first 5 digits of a 7-digit phone # 1- Keep the banks with at least one valid phone # 2- Group the banks to encompass as closely as possible the health region areas - RDD strata 3- Within each RDD stratum, first select one bank at random and then generate at random one number between 00 and 99 4- Repeat the process until the required number of telephone numbers within the RDD stratum is reached

    10. List frame of telephone numbers Sampling of households Simple random sample of telephone numbers 10% of CCHS sampled households Telephone companies’ billing address files and Telephone Infobase (repository of phone directories) 1- Create a list of phone numbers 2- Stratify the phone numbers by health region using the residential postal codes 3- Select phone numbers at random within a health region 4- Repeat the process until the required number of telephone numbers is reached

    11. CCHS - Sampling of persons Area frame Simple random sample (SRS) of one person aged 12 years of age or older (82% of households) SRS sample of two persons aged 12 years of age or older (18%) RDD / List frames SRS sample of one person aged 12 years of age or older

    12. CCHS - Sampling of persons Age 1996 LFS * CCHS group Census sample simulated (all persons) sample ( only 1 person) 12-19 13.2 13.7 8.5 20-29 16.4 14.4 14.3 30-44 30.8 28.7 29.1 45-64 25.8 28.0 27.9 65 + 13.8 15.2 20.2 * averaged distribution over 100 repetitions using the May 99 LFS sample

    13. CCHS - Representativity of sub-populations To address users’ needs, two sub-population groups needed larger effective sample sizes: Youths (12-19 years old) Decision > Oversample youths by selecting a second person (12-19) in some households based on their composition Elderlies (65 years old and +) Decision > Do not oversample - let the general sample selection process address the issue by itself

    14. Sampling strategy based on household composition Number of persons aged 20 or over Number 0 1 2 3 4 5+ of 12-19 0 - A A A A B 1 A A C C C B 2 A C C C C C 3+ A C C C C C A: Simple random sample (SRS) of one person aged 12+ B: SRS of two persons aged 12+ C: SRS of one person in the age group 12-19 and SRS of one person 20+

    15. CCHS - Sample Distribution after Oversampling Age 1996 * CCHS * CCHS group Census simulated simulated sample sample ( only 1 person) ( some 2 persons) 12-19 13.2 8.5 14.9 20-29 16.4 14.3 13.1 30-44 30.8 29.1 28.1 45-64 25.8 27.9 26.3 65 + 13.8 20.2 17.6 * averaged distribution over 100 repetitions using the May 99 LFS sample

    16. CCHS - Initial data collection plan 12 monthly samples 12 collection months + 1 Area frame CAPI STC field interviewers targeted response rate: 90% anticipated vacancy rate: 13% (09 / 2000 - 08 / 2001) + 09 / 2001 RDD / List frames CATI STC call centres targeted response rate: 85% telephone hit rate: 15-60%

    17. CCHS data collection - Observed situation Field interviewers workload exceeded field staff capacity Call centres new collection infrastructure unequal allocation of work among call centres Descriptive paper: « Preventing nonresponse in the Canadian Community Health Survey », Y. Béland, J. Dufour, and M. Hamel. 2001, Hull, Statistics Canada XVIIIth International Symposium.

    18. CCHS - Final response rates Field Call centres Total NFLD 86.6 89.3 86.8 PEI 87.7 82.6 84.7 NS 88.8 89.3 88.8 NB 88.4 92.4 88.5 QUE 85.7 84.8 85.6 ONT 82.8 79.5 82.0 MAN 90.0 85.0 89.5 SASK 87.0 85.4 86.8 ALB 85.2 84.9 85.1 BC 83.9 86.7 84.7 YUK 79.3 95.6 82.7 NWT 89.6 85.4 89.2 NUN 66.3 34.6 62.5 CAN 85.1 83.1 84.7

    20. Modules for proxy and non-proxy Alcohol Chronic condition Exposure to second hand smoke Food insecurity General health (Q1, Q2 and Q7) Health care utilization Health Utility Index (HUI) Height / Weight (Q2 and Q3) Injuries Restriction of activities Smoking Tobacco alternatives Two-week disability Household composition & housing Income Labour force Socio-demographic characteristics Administration Drug use (optional) Home care (optional)

    21. Modules for non-proxy only Alcohol dependence / abuse Blood pressure check Breastfeeding Contacts with mental health professionals Mammography Fruit & vegetable consumption General health (Q3-Q6, Q8-Q10) Height / Weight (Q4 only) PAP smear test PSA test Physical activities Patient Satisfaction** Breast examinations Breast self examinations Changes made to improve health Depression

    24. CCHS - Weighting and Estimation Estimation relates sample back to population MUST use weights in calculation of estimates to correctly draw conclusions about population of interest Sampling weight is related to the probability of selecting a person in the sample Persons are selected with unequal probabilities therefore have varying weights

    25. CCHS - Weighting and Estimation Three separate weighting systems: Area frame design RDD frame design List frame design Several adjustments non-response (household and person) seasonal factor etc... Integration of the two weighting systems based on design effects and sample sizes ( n / deff ) Calibration using a one-dimensional poststratification adjustment of ten age/sex poststrata within each health region Variance estimation : bootstrap re-sampling approach set of 500 bootstrap weights for each individual

    26. Weighting & Estimation

    27. Weighting & Estimation Initial weight: Inverse of the probability of being selected

    28. Weighting & Estimation Household nonresponse: Distribute weight of nonresponding households to responding ones Using “nonresponse classes such as HR, collection period and urban, rural/urban)

    29. Weighting & Estimation No phone lines: No coverage of hhlds without a phone line. Weights are “boosted” by a certain rate (specific to each HR) Rates of “no phone lines” calculated using area frame data

    30. Weighting & Estimation # of people in hhld: Convert the hhld-level weight into a person-level weight (multiply by the number of people) Depends on the # of people selected (1 or 2), and their age

    31. Weighting & Estimation Person level nonresponse: Redistribute the weight of selected person who did not respond to the ones who responded Using classes (age, sex, # person selected, collection period, etc)

    32. Weighting & Estimation Multiple phone lines: More phone lines = higher probability of being selected weight divided by the number of residential phone lines

    33. Weighting & Estimation Final weight: Each frame’s final weight is each representative of the total population. To create a single set of weights, they are combined through “Integration”

    34. Weighting & Estimation Integration: Combine the 2 sets of weights into one single set of weights Based on sample size and design effect of each frame

    35. Weighting & Estimation Seasonal effect: Adjust weights so that each season contains 25% of the total population Based on the collection period (sept-nov / dec-feb / mar-may / june - aug)

    36. Weighting & Estimation Post-stratification: Ensure the sum of weights matches the estimated population projections in each HR, for 10 age-sex groups 12-19, 20-29, 30-44, 45-64 and 65+ crossed with two sexes

    37. Weighting & Estimation Final CCHS weight: Final weight present on the CCHS master file

    38. CCHS - Special Weights For various reasons, many other weights are produced Quarter 4 special weight PEI special weight Share weights (master, Q4 and PEI special) Link weights (master, Q4 and PEI special)

    39. Sampling Error Difference in estimates obtained from a sample as compared to a census The extent of this error depends on four factors: sample size variability of the characteristic of interest sample design estimation method Generally, the sampling error decreases as the size of the sample increases

    40. Sampling Error Measures of precision associated to an estimate Variance Standard deviation (square root of the variance) 95% confidence interval (estimate ± 1.96 x standard deviation) Coefficient of variation Standard deviation of estimate x 100% / estimate itself CV allows comparison of precision of estimates with different scales Examples: 24% of population are daily smokers, std dev. = 0.003 > CV=0.003/0.24 x 100%=1.25% > 95% CI: 0.240 ± 1.96 x 0.003 : {0.234 ; 0.246 }

    41. Sampling Variability Guidelines Type of estimate CV Guidelines Acceptable 0.0-16.5 General unrestricted release Marginal 16.6-33.3 General unrestricted release but with warning cautioning users of the high sampling variablitity. Should be identified by letter E. Unacceptable > 33.3 No release. Should be flagged with letter F.

    42. Sampling Error Measuring sampling error for complex sample designs: Simple formulas not available Most software packages do not incorporate design effect (and weights adjustments) appropriately for calculations Solution for CCHS: the Bootstrap re-sampling method

    43. Bootstrap method Principle: You want to estimate how precise is your estimation of the number of smokers in Canada You could draw 500 totally new CCHS samples, and compare the 500 estimations you would get from these samples. The variance of these 500 estimations would indicate the precision. Problem: drawing 500 new CCHS samples is $$$ Solution: Assuming your sample is representative of the population, sample 500 new subsamples and compute new sampling weights for each subsample.

    44. Bootstrap method How CCHS Bootstrap weights are created (the secret is now revealed!!!)

    45. Bootstrap Method How Bootstrap replicates are built? The “real” recipe 1- Subsample clusters (SRS) within a design stratum 2- Apply (initial design) weight 3- Adjust (boost) weight for selection of n-1 among n 4- Apply all standard weight adjustments (nonresponse, integration, share, etc.) 5- Post-stratification to population counts The bootstrap method intends to mimic the same approach used for the sampling and weighting processes

    46. Bootstrap Method Sampling weight versus Bootstrap weights Sampling weight used to compute the estimation of a parameter (e.g.: number of smokers) Bootstrap weights used to compute the precision of the estimation (e.g.: the CV of the number of smokers estimation)

    47. Bootstrap Method The process of variance estimation is divided into two phases: Calculation of bootstrap weights Need to be produced only once Done by Statistics Canada methodologists

    48. Bootstrap Method Variance estimation using bootstrap weights Done by anyone - internally or externally Bootstrap weights files distributed with all CCHS files, except Public-Use Microdata File (PUMF) Bootstrap weights are in a separate file (match using IDs) Not for PUMF because bootstrap weights reveal confidential info PUMF users must proceed through remote access to get ‘ exact ’ variances or use the CV look-up tables

    49. Bootstrap Method Variance estimation using bootstrap weights SAS and SPSS (beta) macro programs provided to users (BOOTVAR) Allow users to perform a few statistical analysis (totals, proportions, differences of proportions and regression analysis) Fully documented with examples Bootstrap hands-on workshop

    50. How to use the Bootvar program STEP #1 Create your ‘‘analytical file”

    51. How to use the Bootvar program Statistical analysis Using the NPHS cycle 3 (1998) cross-sectional dummy data, estimate the number of ontarians aged 12, by gender, who perceive themselves as being: - in poor or fair health, - in good health, - in very good health, - in excellent health. - Compute 95% confidence interval for each point estimate using the Bootvar program.

    52. Necessary variables for the analysis Self-perceived health (GHC8DHDI) 0 = poor, 1 = fair, 2 = good, 3 = very good, 4 = excellent, 9 = not stated Age (DHC8_AGE) Sex (DHC8_SEX) >= 12 1 = male, 2 = female Province (PRC8_CUR) Sampling weight (WT68) 35 = Ontario Record identifier for the household (REALUKEY) Number identifying the person in the household (PERSONID)

    53. Basic theoritical notions for estimating a proportion Example of a data file ID Weight Sex Asthma Asthma_id A 50 M YES 1 B 60 M NO 0 C 50 M NO 0 D 70 M YES 1 E 50 M NO 0 (WeightA + WeightD) (WeightA+WeightB+WeightC+WeightD+WeightE) = (50 + 70) / (50 + 60 +50 + 70 + 50) * 100 = 120 / 280 * 100 = 43%

    54. Little trick for the statistical analysis Create your univariate dummy variable : Men = 1,0 (men) Good health = 1,0 (good) Men in good health : mgood = men * good men * good = mgood 1 0 0 1 1 1 0 0 0 0 1 0

    55. Results of the statistical analysis Self-perceived health of ontarians aged 12 or older by gender in 1998 # (‘000) 95% CI % 95% CI Men - Poor / fair 391 (330 ; 452) 8.4 (7.1 ; 9 .8) - Good 1,106 (1,007 ; 1,204) 23.9 (21.7 ; 26.0) - Very good 1,764 (1,648 ; 1,880) 38.1 (35.6 ; 40.6) - Excellent 1,373 (1,268 ; 1,479) 29.6 (27.4 ; 31.9) Women - Poor / fair 480 (409 ; 551) 9.9 (8.5 ; 11.4) - Good 1,258 (1,151 ; 1,364) 26.1 (23.9 ; 28.3) - Very good 1,846 (1,726 ; 1,965) 38.2 (35.8 ; 40.7) - Excellent 1,243 (1,138 ; 1,348) 25.8 (23.6 ; 27.9)

    56. Why use the Bootstrap method? Other techniques: Taylor Need to define a linear equation for each statistic examined Jacknife Number of replicates depends on the number of strata (large number of strata makes it impossible to disseminate)

    57. Why use the Bootstrap method? BOOTSTRAP more user-friendly when there is a large number of strata sets of 500 bootstrap weights can be distributed to data users Recommended (over the jackknife) for estimating the variance of nonsmooth functions like quantiles, LICO Official reference: “Bootstrap Variance Estimation for the National Population Health Survey”, D. Yeo, H. Mantel, and T.-P. Liu. 1999, Baltimore, ASA Conference.

    58. CV Look-up Tables Alternative to bootstrap Approximate Can only be used for categorical variables, and for estimations of totals and proportions Available for every health region, province and Canada Provided with PUMF and Share file for some subpopulations

    59. CV Look-up Tables—Example National Population Health Survey - 1996/1997 Approximate Sampling Variability Tables for Ontario Health Area:OTTAWA CARLETON - Selected members NUMERATOR OF ESTIMATED PERCENTAGE PERCENTAGE ('000) 0.1% 1.0% 2.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% 50.0% 70.0% 90.0% 1 ******** 48.6 48.4 47.6 46.4 45.0 43.7 42.3 40.9 39.4 37.8 34.5 26.8 15.5 2 ******** 34.4 34.2 33.7 32.8 31.9 30.9 29.9 28.9 27.9 26.8 24.4 18.9 10.9 3 ******** 28.1 27.9 27.5 26.8 26.0 25.2 24.4 23.6 22.7 21.9 19.9 15.5 8.9 4 ******** 24.3 24.2 23.8 23.2 22.5 21.9 21.2 20.4 19.7 18.9 17.3 13.4 7.7 5 ******** 21.7 21.6 21.3 20.7 20.1 19.5 18.9 18.3 17.6 16.9 15.5 12.0 6.9 6 ******** 19.8 19.7 19.4 18.9 18.4 17.8 17.3 16.7 16.1 15.5 14.1 10.9 6.3 7 ******** 18.4 18.3 18.0 17.5 17.0 16.5 16.0 15.5 14.9 14.3 13.1 10.1 5.8 8 **************** 17.1 16.8 16.4 15.9 15.5 15.0 14.5 13.9 13.4 12.2 9.5 5.5 9 **************** 16.1 15.9 15.5 15.0 14.6 14.1 13.6 13.1 12.6 11.5 8.9 5.2 10 **************** 15.3 15.1 14.7 14.2 13.8 13.4 12.9 12.5 12.0 10.9 8.5 4.9 ... ... 300 **************************************************************************************** 2.0 1.5 0.9 350 **************************************************************************************** 1.8 1.4 0.8 400 ************************************************************************************************ 1.3 0.8 450 ************************************************************************************************ 1.3 0.7 500 ************************************************************************************************ 1.2 0.7 NOTE: FOR CORRECT USAGE OF THESE TABLES PLEASE REFER TO MICRODATA DOCUMENTATION

    60. Another example using the Bootvar program Statistical analysis Using the NPHS cycle 3 (1998) cross-sectional dummy data, determine whether or not the number of men aged 12 or older who perceive themselves as being in excellent health in Ontario is statistically different (at level ?=5%) than the number of women.

    61. Basic theoritical notions for performing a Z-test M_excel = estimated proportion of men in excellent health F_excel = estimated proportion of women in excellent health Hypothesis test: H0: M_excel = F_excel H1: M_excel ? F_excel At level ? = 0,05, we conclude H0 if | z | <= 1.96 We conclude H1 otherwise. Z = ( M_excel - F_excel ) sd (M_excel-F_excel) We use the section “difference of proportions” of the BOOTVAR program to estimate the standard deviation of the difference between the two estimates.

    62. Results M_excel = 29.64% ; F_excel = 25.75% ; sd(M_excel-F_excel) = 1.62 Z = ( M_excel - F_excel ) = (29.64 - 25.75) = 3.89 = 2.40 sd (M_excel-F_excel) 1.62 1.62 At ? = 0,05 level , we conclude H1 because z = 2.40 > 1.96 . We can then conclude that among the ontarians aged 12 or older there is a statistical difference between men and women with regard to the caracteristic “self-perceived health = excellent”.

    63. CCHS - Data Dissemination Strategy Wide range of users and capacity 136 health regions 13 provincial/territorial Ministries of Health Health Canada and CIHI Internal STC analysts Academics Others Data products Microdata Analytical products (Health Reports, How Healthy are Canadians, etc…) Tabular statistics (ePubs, Cansim II, community profiles, etc…) Client support (head and regional offices, CCHS website, workshops, etc…)

    64. CCHS - Access to microdata Master file all records, all variables Statistics Canada university research data centres remote access Share / Link files respondents who agreed to share / link provincial/territorial Ministries of Health health regions (through the STC third-party share agreement) Public Use Microdata File (PUMF) all records, subset of variables with collapsed response categories free for 136 health regions cost recovery for others

    65. CCHS - Overview of Cycle 1.2 Produce provincial cross-sectional estimates from a sample of 30,000 respondents Area frame sample only / one person per household CAPI only 90 minute in-depth interviews on mental health and well-being based on WMH2000 questionnaire Scheduled to begin collection in May 2002

    66. CCHS - Future Plans Same two-year cycle approach: health region level survey starting in January 2003 provincial level survey starting in January 2004 New consultation process with provincial and regional authorities Flexible sample designs (adaptable to regional needs) Development of an in-depth nutrition focus content (Cycle 2.2)

    67. CCHS Web site www.statcan.ca/health_surveys www.statcan.ca/enquetes_santé

    68. Contacts in Methodology Yves Béland: yves.beland@statcan.ca François Brisebois: francois.brisebois@statcan.ca

More Related