160 likes | 175 Views
Learn about sample design issues, group quarters sampling, under-sampling sources, and assigning weights for targeted historical samples.
E N D
Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken
Sample design issues • Goal is to sample entire households (or dwellings) • Every individual has an equal probability of being sampled • Practicality
Basic sample design • Every household is defined as a cluster • Samples are also stratified • Only include in sample if the first person in a household is on a sample line. • Probability of selection = np(1/n) = p 3 person household: 3*.01(1/3) = .01 8 person household: 8*.01(1/8) = .01
Group Quarters • Not practical to sample large institutions in their entirety • A better approach is to apply individual level sampling when household size exceeds a predetermined threshold.
Sampling rules – dwellings/households • 1. If the dwelling contains 30 or fewer residents: • a) accept the entire dwelling if the sample point falls on the first listed individual in the dwelling. • b) reject the entire dwelling if the sample point falls on any other dwelling resident. • 2. If the dwelling contains 31 or more residents and the household contains 30 or fewer persons: • a) accept the entire household if the sample point falls on the household head. • b) reject the entire household if the sample point falls on any other household member.
Sampling rules – group quarters • 3. If the household contains 31 or more persons : • accept individuals on sample lines.
Target and Actual Sample Densities for Completed Historical Samples
Sample Confidence Interval • Estimating the number of sample clusters in the total population • # of sampled person records= # of sample clusters Total Population Total # of clusters
Source of under-sampling • Some enumerator manuscripts were never microfilmed • Data entry error • Processing procedures can lead to deleting records, but rarely adding records • Ambiguity on enumerator manuscripts
Assigning Weights • Each sampled individual represents X number of individuals in the total population. • We have typically assigned weights at the national level.
County Level or SEA Level Weights • 1. Weight at the county level if county population exceeds 10,000 and: • A. all other counties in the SEA have populations exceeding 10,000, or • B. the combined populations of the counties with populations under 10,000 is 10,000 or more. 2. If conditions above are not true, then weight at the SEA level.
Conclusion • Sample designs are fairly straightforward in theory, but source materials and procedures result in under-sampling bias • Detailed weights based on county populations or SEA populations theoretically improve precision