COVERAGE AND SAMPLING

COVERAGE AND SAMPLING Damon Burton University of Idaho

What do each of these important sampling terms mean?

ESSENTIAL SAMPLING DEFINITIONS • Survey Population -- consists of all units (i.e., individuals, households, organizations) to which one desires to generalize survey results. • Sample Frame --list from which a sample is to be drawn in order to represent the survey population. • Sample --consists of all units of the population that are drawn for inclusion in the survey.

ESSENTIAL SAMPLING DEFINITIONS • Completed Sample -- consists of all units (i.e., persons) that complete the survey. • Coverage Error –results from every unit in the survey population not having a known, nonzero chance of being included in the sample. • Sampling Error –is the result of collecting data from only a subset, rather than all, members of the sampling frame.

COVERAGE CONSIDERATIONS • Telephone coverage • Internet coverage • Mail coverage

TELEPHONE COVERAGE • In 2000, telephones were regarded as the best survey mode for general surveys because • high coverage (i.e., 90% of Americans had phones), • Random Digit Dialing (RDD) procedures allowed sampling of most phone users, • People were ameanable to answering survey questions over the phone. • By 2003, half of all US citizens used cell phones, and by 2007, 16% had only cell service. • Today almost 20% of US adults would be excluded by RDD sampling procedures.

INTERNET COVERAGE • The internet is a useful mode for conducting surveys for specific populations who have service (e.g., students, professionals, & businesses), but it has significant coverage gaps with the general population. • As of 2007, only 71% of Americans used the internet at least occasionally. • Only 67% had internet service in their homes. • Only 47% had high-speed home internet service with 23% having dial-up and 29% having no home internet access. • Internet growth seems to be slowing.

PROBLEMS WITH INTERNET FOR POPULATION SURVEYS • No list of all, or most, internet subscribers is available (i.e., sampling frame). • No simple procedure is available for drawing samples in which individuals, or households, have a known, nonzero chance of inclusion. • People’s ability to use the internet varies significantly, even in households with good access. • Because internet providers are private, not public, legal and cultural barriers prevent contacting randomly generated email addresses. • Web surveyors often use self-selected panels of respondents, creating a number of sampling issues.

MAIL COVERAGE • Phone books once were good sources of addresses for mail surveys. • By 1990, 25% of households had unlisted numbers, and cell phone-only households rose sharply. • Address-based sampling has become more feasible with US Postal Service DSF lists. • DSF is an electronic file containing all delivery point addresses by USPS. • Names are provided for addresses except PO boxes. • DSF can’t tell homes from businesses. • Geocoding is possible for stratified sampling or targeting specific populations. • DSF is available thru vendors, each has different processes for managing and updating lists.

MAIL COVERAGE • Missing addresses for multiperson dwellings (e.g., apartments) are problematic. • Initial evaluations have shown DSF mail surveys with a reminder mailing resulted in 4-7% higher response rates than RDD surveys. • RDD and DSF surveys overrepresent white, non-Hispanic individuals with higher education levels who are married. • Other lists sometimes used include licensed drivers, utility users, registered voters, and homeowners. • General lists may be compiled from multiple sources, including: credit card holders, telephone directories, magazine subscribers, bank depositors, organization membership lists, catalog and internet customers, and other sources.

What are the major coverage issues for phone, internet and mail surveys?

REDUCING COVERAGE ERRORS • Many surveys are designed for special populations. • You need to know how a specific list is compiled, maintained and used. • 5 important questions to ask about any potential sampling list.

COVERAGE QUESTION 1 • Does the list contain everyone in the survey population? • If not, determine whether getting the remainder of the people on the list is possible. • Evaluate the consequences of not obtaining excluded names.

COVERAGE QUESTION 2 • Does the list include names of people who are not in the study population? • If so, learning up front exactly who is on the list and why would have allowed respondents to only answer questions appropriate for them. • This targeting strategy would save valuable resources.

COVERAGE QUESTION 3 • How is the list maintained and updated? • You may need to check the accuracy of addresses before surveying. • Accuracy depends on continual updating of addresses on list.

COVERAGE QUESTION 4 • Are the same sample units included on the list more than once? • Customers’ names may be added to the list each time they order if a slightly different name or address are given. • Divorced parents are often on the list twice compared to married parents only once.

COVERAGE QUESTION 5 • Does the list contain other information that can be used to improve the survey? • Use mixed modes for different aspects of the survey process. • Age and gender can be used to identify nonresponse error. • What other information would be valuable?

RESPONDENT SELECTION • Samples drawn from phone books in the 1970-1990’s typically produced a higher proportion of male respondents, even when letters requested females complete the survey. • Women are more likely to participate in phone surveys because they answer the phone more often. • Commonly ask for “the adult with the most recent birthday” to randomize respondents in the household. • Other surveys target “the individual who shops for groceries most often,” “who makes the investment decisions,” or “who purchases the computer.”

COVERAGE OUTCOMES • The goal is that every unit in the survey population appears on the sample frame list only once, so the survey population is prepared for actual sampling. • Often researchers must decide what amount of coverage is acceptable. • Do alternatives exist? What is the cost of those alternatives? Can the coverage error be accurately assessed? • Mixed mode surveys are a possibility. For example, most of the survey is conducted on the internet, but hard copy surveys are mailed to the portion of the sample who don’t have internet access.

PROBABILITY SAMPLING • Sampling error is the type of error that occurs because information is requested from only a sample of the population rather than the entire sample. • The first step in drawing a sample is to understand the number of properly selected respondents necessary for generalizing results to the population and with what degree of accuracy.

How do I calculate the desired sample size for a survey study?

HOW LARGE SHOULD A SAMPLE BE? • The size of the sample, not the proportion sampled, is what affects precision. • The formula takes into account • How much sampling error can be tolerated within a given confidence interval, • The amount of confidence one wishes to have in the estimates, • How varied the population is with respect to the characteristic of interest, and • The size of the population from which the sample is drawn.

SAMPLE SIZE • (Np)(p)(1-p) Ns = (Np-1)(B/C)2+(p)(1-p) • Formula terms • Ns = the completed sample size needed for the desired level of precision. • Np = the size of the population, • p = the proportion of the population expected to choose one of the 2 response categories, • B = margin of error (i.e., half of the desired confidence interval width such as 3%), • C = Z score associated with the confidence level (i.e., 1.96 corresponds with a 95% confidence level).

How do I draw a good simple random sample?

5 SAMPLING PREMISES • Relatively few completed questionnaires can provide surprising precision at a high level of confidence. • Among large populations, there is virtually no difference in the completed sample size needed for a given confidence level of precision. • Within small populations, greater proportions of the population are needed to be surveyed to achieve estimates with a given margin of error. • At higher levels of sample size, increasing your sample size yield smaller and smaller reductions in margin of error. • Completed sample sizes must be much larger if one wants to make precise estimates for subgroups of the population.

DRAWING A SIMPLE RANDOM SAMPLE • Typically numbers are assigned to every member of the sample frame, and the computer randomly selects a certain number of respondents. • Sometimes comparisons require sampling different segments of the population unequally. Comparing employees who have worked for a c company more or less than 6 months requires weighted sampling. • Because employees with less than 6 months service represent only 5% of the workforce, more veteran employees have a 20% great chance of being select. If you need equal numbers for these 2 groups, you’ll need to sample a higher percentange of new than older employees.

The End

COVERAGE AND SAMPLING

COVERAGE AND SAMPLING

Presentation Transcript

SAMPLING AND MIXING

Sampling Methods and Sampling Distributions

Sampling and Sampling Distributions

Sampling and Reconstruction

Sampling and Equipment

Agricultural Census Sampling Frames and Sampling

Agricultural Census Sampling Frames and Sampling

Chapter 7 Sampling and Sampling Distributions

7.0 Sampling and Sampling Distribution

SAMPLING AND SAMPLING DISTRIBUTIONS

Sampling and Sampling Distributions

Chapter 7 Sampling and Sampling Distributions

Stratified Sampling for Fault Coverage of VLSI Systems

Sampling Methods and Sampling Distributions

Sampling Analysis and Random Sampling

Sampling and Sampling Distributions: Part 2

Sampling and Sampling Distributions

Sampling and Sampling Distributions

Sampling and Sampling Distribution

Sampling and Sampling Distributions