160 likes | 414 Views
Estimation and Weighting, Part I. Goal of Estimation. Minimize a survey’s total error Sampling Error is error arising solely from the sampling process (measure: variance) Mainly a function of sample size Surveys are also subject to biases from nonsampling errors such as:
E N D
Goal of Estimation Minimize a survey’s total error • Sampling Error is error arising solely from the sampling process (measure: variance) • Mainly a function of sample size • Surveys are also subject to biases from nonsampling errors such as: • Coverage errors and non-probability sampling • Response errors • Nonresponse
Typical Estimation Steps The estimation steps for a typical household survey avoid or help control some nonsampling errors • Editing and Imputation are aimed at controlling response errors • Basic Weighting based on probabilities of selection produces essentially unbiased estimates when there is 100% response and no response error • Nonresponse Adjustment helps avoid some obvious biases that arise when nonrespondents are ignored • Population Controls help minimize some coverage problems
Editing and Imputation Editing • deleting or correcting unacceptable data values • coding/combining data to classify respondents Imputation – insert values for missing data • for missing items (imputation is common) • For missing HH or persons (not used as often) • modeling methods • Hot deck methods
Item Nonresponse Imputation When a household is interviewed and a small amount of data is not obtained for a person, imputing for the missing data creates a complete data set. Hot Deck Method: Use answers from another similar unit to impute answers for an item nonresponse – “nearest neighbor” Modeling Method: Mathematically impute an answers for an item nonresponse
Example of Imputation Suppose a woman aged 29, was employed last month. This month, we were not able to obtain her labor force status. Construct a “transition matrix” using records of “similar” persons with labor force status coded in both months – use females aged 24-45.
Example of Imputation Based on Frequencies, Compute Probabilities
Example of Imputation • Generate a random number between 0 and 1 • If rn = .7221, for example, then rn falls in the range [0, .9449] and “employed” is imputed for this month • Will happen 94.49% of the time • No guarantee that this is right for the particular data item that is imputed • Imputed data set is complete and preserves known relationships
Example of Imputation Would you impute a labor force status? Maybe not: • Usually a determination will be made concerning how much data is required for a response to be accepted by a survey • For a labor force survey, enough information to determine LF status will probably be required
Purpose of Weighting Estimate the number of persons each person in a sample household represents Each person interviewed helps represent • not-in-sample population of the area (geographic stratum) where the person lives • sample persons not interviewed • Generally, persons of the same age, race, gender, and ethnic origin as the person interviewed
Basic Weights Applied at the household level (all persons in HH have the same basic weight) Inverse of probability of selection In a typical HH sample there are two stages of sampling and two probabilities • 1st stage probability for an EA EAprob • 2nd stage probability for HH in that EA HHprob • TOTprob = EAprob * Hhprob • Baseweight = 1/TOTprob
Base Weights • Self weighting samples are not common • Primary stratifier for HH surveys is geography, such as state • often the base weights in a state are all equal • OR nearly the same • For a self-weighting stratum use N/n: Number N of HHs on the Frame Number n of HHs in the Sample
Example of Basic Weighting • Self-weighting within state • State A has N= 500,000 and sample n=2,000 • baseweight = N/n = 500,000/2,000 = 250 • An estimate of employment obtained by multiplying sample count (EMP = 3,000) by the baseweight • 3,000 x 250 = 750,000 • State B has N= 175,000 and sample n=1,750 • baseweight = N/n = 175,000/1,750 = 100 • An estimate of unemployment obtained by multiplying sample count (UE = 250) by the baseweight • 250 x 100 = 25,000
Simple Weighted Estimates Estimate x of a Total X • A Simple Weighted Estimate adds persons using their weights (wi weight for ith person) • Sum across all persons in the sample • xi is a data value for person i • for example xi = 1 for employed, 0 otherwise
Simple Weighted Estimates Example Continue the previous example for State A • Simple Weighted Estimate of employment xi = 1 for employed, 0 otherwise • Can restrict sum to the 3,000 employed • since xi=0 for the other responding persons