390 likes | 570 Views
Indirect Sampling. Jerilyn Boykin and Zhongxue Chen. Indirect Sampling. Introduction: What is indirect sampling? Generalized Weight Sharing: A Unified Method Some specific cases Cross Sectional Estimation (Ernst, 1989) Multiplicity Estimation (Sirken, 1970)
E N D
Indirect Sampling Jerilyn Boykin and Zhongxue Chen
Indirect Sampling • Introduction: What is indirect sampling? • Generalized Weight Sharing: A Unified Method • Some specific cases • Cross Sectional Estimation (Ernst, 1989) • Multiplicity Estimation (Sirken, 1970) • Frames Containing Unknown Amount of Duplicity (Rao,1968)
Indirect Sampling • Why Indirect Sampling • Population frame is not available; • Population frame is available; • There is a relationship (link) between these two populations • The Generalized Weight Sharing Method is a unified method indirect sampling developed by Lavallee (2002)
GWSM: Notation • Population: • Number of units: • Label of units: j i • Selected sample: • Link Matrix
GWSM: Sampling • Sample from • is the selection probability • For each j in , identify the units i in such that • The set • Want to estimate
GWSM: Estimation • Let • And • Then
GWSM: Estimation • Horvitz-Thompson: • Let • HT-estimator:
GWSM: Variance • Variance estimation: • Let • Then • Variance: • Where
GWSM: Variance Estimation • Variance estimation (Horvitz-Thompson):
Specific Examples • Monroe G. Sirken 1970, Multiplicity Estimation • J.N.K Rao 1968, Sampling a Frame with an Unknown Amount of Duplicity • L.R. Ernst, Longitudinal Household Surveys
Household Surveys with Multiplicity (Sirken, 1970) • Estimate the number of individuals in population with certain attribute • Complete frame is not available • Sample households report information about their own residents as well as others persons who live elsewhere • Relatives • Neighbors
Multiplicity Rule • Other persons are specified by a “multiplicity rule” adopted in the survey • Example: “siblings report each other” • Total number of households in population reporting an individual is referred to as their multiplicity • Multiplicity of a person is number of different households in which he or one of his siblings is a resident.
Some Notation…. • Consider the conventional survey indicator variable: • if is a resident of • otherwise
Some Notation…. • Consider the conventional survey indicator variable: • if is a resident of • otherwise • not a resident but reported by • otherwise
Some Notation…. • Number of individuals reported by in the conventional survey • Weighted number of individuals reported by in the survey with • multiplicity is • where is the number of households reporting • or the multiplicity of
Some Notation…. • Notice the variate based on multiplicity survey requires the multiplicity of every individual reported by household, .
The Estimators • Assume a sample of households without replacement, then • is the estimate of derived from the conventional survey, and • is the estimate from the survey with multiplicity.
Variance • The variances of and are, and • It follows that where • is the relative gain in sampling efficiency resulting from the survey with multiplicity.
Surveys with Multiplicity • Household surveys w/ multiplicity are applicable whenever multiplicity rules can be devised that produce estimates having smaller MSE’s than those from conventional surveys • Non-sampling error may be a problem
Sampling Theory When Frame Contains Unknown Amount of Duplicity (Rao, 1968) • Arose in connection with a sample survey of beef cattle producers • Beef cattle producing operation which could be operated by individual or partnership • Frame was not available • Frame of list of addresses of individuals believed to be beef cattle producers
Rao, 1968 • Questionnaire mailed to random sample of addresses then to a random sub-sample of non-respondents • Respondents identified as partnerships were asked to give names and addresses of partners and only complete 1 questionnaire for the partnership • Names and addresses were used to determine the number of times an operation was in list frame
Some Notation… • is number of names in sample that respond to mail questionnaire • is number in nonresponse group • Data are obtained by direct interview for random subsample of nonrespondents
Some Notation… • is unknown number of beef operations covered by list frame • is population total of a character attached to beef operations • is the total attached to the operation • is the number of addresses on the list frame and is the number of times the operation is listed on the list frame.
Some Notation… • Let and denote the and the of the sample operation contactable via the sample address
The Estimator • Using the Hansen-Hurwitz estimator, and the fact that an unbiased estimate of can be obtained
The Estimator • The unbiased estimator of Y is where and denote the number of distinct operations in the sample and sub-sample • and are the number of times the operation appears in the sample and sub-sample
Variance • The variance for the estimator with multiplicity is given by,
The Estimator • Estimators that do not depend on and may be obtained • Concept of sufficiency in sampling theory • Very cumbersome for moderate to large sample sizes
Cross Sectional Estimation from Longitudinal Household Survey, Ernst (1989) • What happens to households and families over time • Composition of households and families can change over time • What weighting procedures should be used to obtain unbiased estimates
Ernst (1989) • Take a month to be a basic unit of time • denotes a cross sectional universe of households • is set of units residing in a household in • Several rounds of interviews, at each month or interval of months • Initial sample is taken at month • Final interview month for sample panel is month
Ernst (1989) • Individual in a chosen household at month is an “original sample person” • For each month all original sample people in plus all other people residing with original sample person • Latter group of people are “associated sample people”
Longitudinal Household (LHH) • Each LHH is of the form • where is a given household at month • Has two part definition • For any specify which if any can be in the same LHH • What kind of LLH’s can exist in L
LHH • This paper considers the restriction that L consists of a cohort of LHH’s • existence at month , the initial LHH’s, • LHH formed after month, those generated by initial LHH’s
Obtaining Weights • Let be the parameter of interest • The unit has a known positive probability of being chosen • would be estimated by ,where
Obtaining Weights • Subsequent LHH’s would only be in sample if at least one household member is an original sample person • To use regular estimator we need to know those probabilities • “Operationally impossible” to determine this probability • Determine 1st round HH for each member of current HH • Compute probability at least one 1st round HH was selected
Obtaining Weights • In order for estimator to be unbiased it is only necessary that • Let M be the individuals in • Let denote the probability that the individual’s household is in sample at month • Their associated weight is
Obtaining Weights • For the ith LHH associate a set of constants independent of and • The weight of the ith LHH is
References • Lavallee, P and Caron, P. Estimation using the generalised weight share method: the case of record linkage. http://www.statcan.ca/english/ads/12-001-XXPB/pdf/27_2_lavallee_e.pdf • Jean-Claude Deville and Myriam Maumy . A new survey methodology for describing tourism activities and expanses http://www.tourismforum.scb.se/papers/PapersSelected/CS/Paper33FRANCE/Deville_Maumy_article.pdf • Ernst L.R. (1989) Weighting issues for longitudinal household ans family estimates. In Panel Surveys • Rao, J.N.K(1968). Some nonresponse sampling theory when the frame contains unknown amount of duplication. Journal of American Statistical Association • Sirken M.G. (1970). Household surveys with multiplicity. Journal of the American Statistical Association