1.02k likes | 1.22k Views
Types of Surveys. Cross-sectional surveys a specific population at a given point in time will have one or more of the design components stratification clustering with multistage sampling unequal probabilities of selection Longitudinal
E N D
Types of Surveys Cross-sectional • surveys a specific population at a given point in time • will have one or more of the design components • stratification • clustering with multistage sampling • unequal probabilities of selection Longitudinal • surveys a specific population repeatedly over a period of time • panel • rotating samples
Cross Sectional Surveys Sampling Design Terminology
Methods of Sample Selection Basic methods • simple random sampling • systematic sampling • unequal probability sampling • stratified random sampling • cluster sampling • two-stage sampling
Simple Random Sampling Why? • basic building block of sampling • sample from a homogeneous group of units How? • physically make draws at random of the units under study • computer selection methods: R, Stata
Systematic Sampling Why? • easy • can be very efficient depending on the structure of the population How? • get a random start in the population • sample every kth unit for some chosen number k
Additional Note Simplifying assumption: • in terms of estimation a systematic sample is often treated as a simple random sample Key assumption: • the order of the units is unrelated to the measurements taken on them
Unequal Probability Sampling Why? • may want to give greater or lesser weight to certain population units • two-stage sampling with probability proportional to size at the first stage and equal sample sizes at the second stage provides a self-weighting design (all units have the same chance of inclusion in the sample) How? • with replacement • without replacement
With or Without Replacement? • in practice sampling is usually done without replacement • the formula for the variance based on without replacement sampling is difficult to use • the formula for with replacement sampling at the first stage is often used as an approximation Assumption: the population size is large and the sample size is small – sampling fraction is less than 10%
Stratified Random Sampling Why? • for administrative convenience • to improve efficiency • estimates may be required for each stratum How? • independent simple random samples are chosen within each stratum
Example: Survey of Youth in Custody • first U.S. survey of youths confined to long-term, state-operated institutions • complemented existing Children in Custody censuses. • companion survey to the Surveys of State Prisons • the data contain information on criminal histories, family situations, drug and alcohol use, and peer group activities • survey carried out in 1989 using stratified systematic sampling
SYC Design strata • type (a) groups of smaller institutions • type (b) individual larger institutions sampling units • strata type (a) • first stage – institution by probability proportional to size of the institution • second stage – individual youths in custody • strata type (b) • individual youths in custody • individuals chosen by systematic random sampling
Cluster Sampling Why? • convenience and cost • the frame or list of population units may be defined only for the clusters and not the units How? • take a simple random sample of clusters and measure all units in the cluster
Two-Stage Sampling Why? • cost and convenience • lack of a complete frame How? • take either a simple random sample or an unequal probability sample of primary units and then within a primary take a simple random sample of secondary units
Synthesis to a Complex Design Stratified two-stage cluster sampling Strata • geographical areas First stage units • smaller areas within the larger areas Second stage units • households Clusters • all individuals in the household
Why a Complex Design? • better cover of the entire region of interest (stratification) • efficient for interviewing: less travel, less costly Problem: estimation and analysis are more complex
Ontario Health Survey • carried out in 1990 • health status of the population was measured • data were collected relating to the risk factors associated with major causes of morbidity and mortality in Ontario • survey of 61,239 persons was carried out in a stratified two-stage cluster sample by Statistics Canada
OHSSample Selection • strata: public health units – divided into rural and urban strata • first stage: enumeration areas defined by the 1986 Census of Canada and selected by pps • second stage: dwellings selected by SRS • cluster: all persons in the dwelling
Longitudinal Surveys Sampling Design
British Household Panel Survey Objectives of the survey • to further understanding of social and economic change at the individual and household level in Britain • to identify, model and forecast such changes, their causes and consequences in relation to a range of socio-economic variables.
BHPS: Target Population and Frame Target population • private households in Great Britain Survey frame • small users Postcode Address File (PAF)
BHPS: Panel Sample • designed as an annual survey of each adult (16+) member of a nationally representative sample • 5,000 households approximately • 10,000 individual interviews approximately. • the same individuals are re-interviewed in successive waves • if individuals split off from original households, all adult members of their new households are also interviewed. • children are interviewed once they reach the age of 16 • 13 waves of the survey from 1991 to 2004
BHPS: Sampling Design Uses implicit stratification embedded in two-stage sampling • postcode sector ordered by region • within a region postcode sector ordered by socio-economic group as determined from census data and then divided into four or five strata Sample selection • systematic sampling of postcode sectors from ordered list • systematic sampling of delivery points (≈ addresses or households)
Survey Weights: Definitions initial weight • equal to the inverse of the inclusion probability of the unit final weight • initial weight adjusted for nonresponse, poststratification and/or benchmarking • interpreted as the number of units in the population that the sample unit represents
Interpretation Interpretation • the survey weight for a particular sample unit is the number of units in the population that the unit represents
Effect of the Weights • Example: age distribution, Survey of Youth in Custody
Observations • the histograms are similar but significantly different • the design probably utilized approximate proportional allocation • the distribution of ages in the unweighted case tends to be shifted to the right when compared to the weighted case • older ages are over-represented in the dataset
Survey Data Analysis Issues and Simple Examples from Graphical Methods
Issues iid (independent and identical distribution) assumption • the assumption does not not hold in complex surveys because of correlations induced by the sampling design or because of the population structure • blindly applying standard programs to the analysis can lead to incorrect results
Example: Rank Correlation Coefficient Pay equity survey dispute: Canada Post and PSAC • two job evaluations on the same set of people (and same set of information) carried out in 1987 and 1993 • rank correlation between the two sets of job values obtained through the evaluations was 0.539 • assumption to obtain a valid estimate of correlation: pairs of observations are iid
Scatterplot of Evaluations • Rank correlation is 0.539
A Stratified Design with Distinct Differences Between Strata • the pay level increases with each pay category (four in number) • the job value also generally increases with each pay category • therefore the observations are not iid
Correlations within Level Correlations within each pay level • Level 2: –0.293 • Level 3: –0.010 • Level 4: 0.317 • Level 5: 0.496 Only Level 4 is significantly different from 0
Graphical Displays first rule of data analysis • always try to plot the data to get some initial insights into the analysis common tools • histograms • bar graphs • scatterplots
Histograms unweighted • height of the bar in the ith class is proportional to the number in the class weighted • height of the bar in the ith class is proportional to the sum of the weights in the class
Body Mass Index measured by • weight in kilograms divided by square of height in meters • 7.0 < BMI < 45.0 • BMI < 20: health problems such as eating disorders • BMI > 27: health problems such as hypertension and coronary heart disease
Bar Graphs Same principle as histograms unweighted • size of the ith bar is proportional to the number in the class weighted • size of the ith bar is proportional to the sum of the weights in the class