280 likes | 353 Views
Design of Cross-sectional Surveys using Cluster Sampling: an Overview with Australian Case-studies. Jane Hocking 1 John B. Carlin 2 1 Public Health Training Scheme, North Western Health, Victoria.
E N D
Design of Cross-sectional Surveys using Cluster Sampling: an Overview with Australian Case-studies. Jane Hocking 1 John B. Carlin2 1Public Health Training Scheme, North Western Health, Victoria. 2Clinical Epidemiology and Biostatistics Unit, Royal Children’s Hospital Research Institute and University of Melbourne Department of Paediatrics, Victoria.
Objectives • to outline cluster sampling • to discuss the concepts of design effect and intracluster correlation and their role in sample size estimation for cluster based surveys • to analyse a large Melbourne school based survey
Background • cross-sectional surveys are important in epidemiological research • surveys based on simple random samples are easy to analyse, BUT • often not feasible - problems with obtaining a sampling frame that lists all individuals in the population • not cost effective to perform
Cluster Sampling • identify a sampling frame of clusters of individuals - eg: school classes, local government areas, community health centres • 2 stage - sampling OR multistage - sampling • can be selected by: • simple random sampling , OR • sampling with probability proportional to size (PPS) • BUT - the exact size of the clusters is often unknown at the time of sampling - need weighted analysis
Sample Size Requirements • balance between precision and cost • individuals within a cluster tend to be more alike than those in different clusters resulting in larger standard errors for estimates • loss of precision must be anticipated at the design stage by increasing the sample size
Design Effect (deff) • of use when considering the precision of cluster sample surveys • measures the performance of a particular sampling method against that of a simple random sample • deff may be estimated for measures of association as well as for measures of prevalence (or mean values)
Intracluster Correlation () • deff depends on the size of the clusters and the strength of correlation within clusters • provides a measure of the degree of homogeneity amongst cluster subjects for the particular outcome under investigation
Sample Size • use standard sample size estimation methods to obtain suitable sample size under simple random sampling and then scale this value up using an estimated deff • need the average cluster size and - which is a feature of the population under study
Traffic Exposure Survey • walking and cycling activity of children 6-9 years was surveyed in Melbourne • 2 stage random cluster sampling design • 72 schools sampled, 3104 students • outcomes - • 1) prevalence measures -proportion of children walking to school, the number of streets crossed and select socio-economic variables; • 2) measures of association - various factors with walking to school
Analysis • using Stata - has a family of commands designed for data from survey samples - allow for valid adjustment of clustering, stratification and sampling weights • Stata provides a direct estimate of deff for each outcome 1. Siddiqui, Hedeker, et al. Intracluster correlation estimates in a school based smoking prevention study. Am J Epidemiol. 1996;144:425-433
Conclusion • cluster sampling methodology becoming more common • consideration of sample size requirements and subsequent analysis is needed • required sample size is dependent on the purpose of the survey - i.e.: prevalence vs association • results need to be published
Design of Cross-sectional Surveys using Cluster Sampling: an Overview with Australian Case-studies. Jane Hocking 1 John B. Carlin2 1Public Health Training Scheme, North Western Health, Victoria. 2Clinical Epidemiology and Biostatistics Unit, Royal Children’s Hospital Research Institute and University of Melbourne Department of Paediatrics, Victoria.
Objectives • to outline cluster sampling • to discuss the concepts of design effect and intracluster correlation and their role in sample size estimation for cluster based surveys • to analyse a large Melbourne school based survey
Background • cross-sectional surveys are important in epidemiological research • surveys based on simple random samples are easy to analyse, BUT • often not feasible - problems with obtaining a sampling frame that lists all individuals in the population • not cost effective to perform
Cluster Sampling • identify a sampling frame of clusters of individuals - eg: school classes, local government areas, community health centres • 2 stage - sampling OR multistage - sampling • can be selected by: • simple random sampling , OR • sampling with probability proportional to size (PPS) • BUT - the exact size of the clusters is often unknown at the time of sampling - need weighted analysis
Sample Size Requirements • balance between precision and cost • individuals within a cluster tend to be more alike than those in different clusters resulting in larger standard errors for estimates • loss of precision must be anticipated at the design stage by increasing the sample size
Design Effect (deff) • of use when considering the precision of cluster sample surveys • measures the performance of a particular sampling method against that of a simple random sample • deff may be estimated for measures of association as well as for measures of prevalence (or mean values)
Intracluster Correlation () • deff depends on the size of the clusters and the strength of correlation within clusters • provides a measure of the degree of homogeneity amongst cluster subjects for the particular outcome under investigation
Sample Size • use standard sample size estimation methods to obtain suitable sample size under simple random sampling and then scale this value up using an estimated deff • need the average cluster size and - which is a feature of the population under study
Traffic Exposure Survey • walking and cycling activity of children 6-9 years was surveyed in Melbourne • 2 stage random cluster sampling design • 72 schools sampled, 3104 students • outcomes - • 1) prevalence measures -proportion of children walking to school, the number of streets crossed and select socio-economic variables; • 2) measures of association - various factors with walking to school
Analysis • using Stata - has a family of commands designed for data from survey samples - allow for valid adjustment of clustering, stratification and sampling weights • Stata provides a direct estimate of deff for each outcome 1. Siddiqui, Hedeker, et al. Intracluster correlation estimates in a school based smoking prevention study. Am J Epidemiol. 1996;144:425-433
Conclusion • cluster sampling methodology becoming more common • consideration of sample size requirements and subsequent analysis is needed • required sample size is dependent on the purpose of the survey - i.e.: prevalence vs association • results need to be published