150 likes | 340 Views
Sampling weights: an appreciation. (Sessions 19). Learning Objectives. By the end of this session, you will be able to explain the role of sampling weights in estimating population parameters calculate sampling weights for very simple sampling designs
E N D
Sampling weights: an appreciation (Sessions 19)
Learning Objectives By the end of this session, you will be able to • explain the role of sampling weights in estimating population parameters • calculate sampling weights for very simple sampling designs • appreciate that calculating sampling weights for complex survey designs is non-trivial and requires professional expertise
What is meant by sampling weights? • Real surveys are generally multi-stage • At each stage, probabilities of selecting units at that stage are not generally equal • When population parameters like a mean or proportion is to be estimated, results from lower levels need to be scaled-up from the sample to the population • This scaling-up factor, applied to each unit in the sample is called its sampling weight.
A simple example • Suppose for example, a simple random sample of 500 HHs in a rural district (having 7349 HHs in total) showed 140 were living below the poverty line • Hence total in population living below the poverty line = (140/500)*7349 =2058 • Data for each HH was a 0,1 variable, 1 being allocated if HH was below poverty line. • Multiplying this variable by 7349/500=14.7 & summing would lead to the same answer. • i.e. sampling weight for each HH = 14.7
Why are weights needed? • Above was a trivial example with equal probabilities of selection • In general, units in the sample have very differing probabilities of selection, i.e. rare to get a self-weighting design • To allow for unequal probabilities of selection, each unit is weighted by the reciprocal of its probability of selection • Thus sampling weight=(1/prob of selection)
Weights in stratified sampling • Consider “To the Woods” example data set discussed in Session 10. • Mean number of large trees were: • 97.875 in region 1, based on n1=8 • 83.500 in region 2, based on n2=6 • Hence total number of large trees in the forest can be computed as (96*97.875) + (72*83.5) = 15408 • So what are the sampling weights used for each unit (plot)?
Self-weighting again • The sampling weights are the same for all plots, whether in region 1 or region 2. Why is this? • What are the probabilities of selection here? • In region 1, each unit is selected with prob=8/96 • In region 2, each unit is selected with prob=6/72 • Recall that a design where probabilities of selection are equal for all selected units is called a self-weighting design. • So regarding the sample as a simple random sample should give us the correct mean.
Results for means • The mean number of large trees, using the formula for stratified sampling, gives [(96/168)*97.875 ] + [(72/168)*83.5] = 91.71 • Regarding the 14 observations pretending they were drawn as a simple random sample gives 91.71 as the answer. • The results for variances however differ • Variance of stratified sample mean=1.28 • Variance of mean ignoring stratification = 2.18
Results for means • Important to note that the weights used in computing a mean, i.e. • (96/168)*(1/8) = 1/14 for plots in region 1, & • (72/168)*(1/6) = 1/14 for plots in region 2, are not sampling weights • Sampling weights refer to the multiplying factor when estimating a total. • Essentially they represent the number of elements in the population that an individual sampling unit represent.
Other uses of weight • Weights are also used to deal with non-responses and missing values • If measurements on all units are not available for some reason, may re-compute the sampling weights to allow for this. • e.g. In conducting the Household Budget Survey 2000/2001 in Tanzania, not all rural areas planned in the sampling scheme were visited. As a result, sampling weights had to be re-calculated and used in the analysis.
Computation of weights • General approach is to find the probability of selecting a unit at every stage of the sample selection process • e.g. in a 3-stage design, three set of probabilities will result • Probability of selecting each final stage unit is then the product of these three probabilities • The reciprocal of the above probability is then the sampling weight
Difficulties in computations • Standard methods as illustrated in textbooks on sampling, often do not apply in real surveys • Complex sampling designs are common • Computing correct probabilities of selection can then be very challenging • Usually professional assistance is needed to determine the correct sampling weights and to use in correctly in the analysis
Software for dealing with weights • When analysing data from complex survey designs, it is important to check that the software can deal with sampling weights • Packages such as Stata, SAS, Epi-info have facilities for dealing with sampling weights • However, need to be careful that the approaches used are appropriate for your own survey design
References • Brogan, D. (2004) Sampling error estimation for survey data. Chapter XII, pp.447-490, of the UN Publication An Analysis of Operating Characteristics of Household Surveys in Developing and Transition Countries: Survey Costs, Design Effects and Non-Sampling Errors. Available at http://unstats.un.org/unsd/hhsurveys/index.htm. (accessed 10th September 2007) • Lohr, S.L. (1999) Sampling: Design and Analysis. International Thomson Publishing. ISBN 0-534-35361-4 • Rao, P.S.R.S. (2000) Sampling Methodologies: with applications. Chapman and Hall, London.