CHALLENGES IN VISITOR-VOLUME ESTIMATION: OVERVIEW

CHALLENGES IN VISITOR-VOLUME ESTIMATION:OVERVIEW Presented By: Dr. Michael Kaylen University of Missouri

Outline • Statistical Background Estimators, Projection Weights, and Properties of Estimators • Sources of Error in a Survey Frame vs. Target Populations, Non-Response Bias • Travel-Related Measures Travel Parties, Household Trips, Person Trips, Person Nights, Traveler Expenditures • A Note on Sample Size • A Note on Stratification • Considerations

Statistical Background Properties of Estimators for Population Parameters • Population Parameters are characteristics of populations Example: Missouri is interested in the population of households in the continental U.S. (excluding those in Missouri) at the start of the first quarter of 2008. A parameter of interest is the total number of trips those households took to Missouri during the first quarter of 2008. If there are N households in the population of interest and yh is the number of trips household h made to Missouri, then the parameter is given by:

Statistical Background Properties of Estimators for Population Parameters • Estimators for unknown parameters are functions of the elements of a random sample. Example: For the Missouri case, suppose the sampling design is such that the probability of household h’s inclusion in the sample set (S) is given by πh. The “design weight” is wh=1/ πh and an estimator for Y is If we randomly sample one out of 1,000 households, the inclusion probability is just 1/1,000 and the design weight (“projection weight”) for every household in that stratum would be 1,000.

Statistical Background Properties of Estimators for Population Parameters • 3 Properties of Estimators • Bias: The expected (average) difference between an estimator and the parameter. When we take a sample and calculate the value of an estimator for that sample, we have an estimate. The difference between that estimate and the true value for the parameter is referred to as sampling error. An estimator is unbiased if its average sampling error is zero. • Variance: The expected (average) squared difference between an estimator and the expected (average) value for the estimator. • Mean Squared Error: The expected (average) squared difference between an estimator and the parameter. Note: MSE = Bias2 + Variance

SOURCES OF ERROR IN A SURVEY Target population

SOURCES OF ERROR IN A SURVEY Frame population Target population

SOURCES OF ERROR IN A SURVEY Sample Frame population Target population

SOURCES OF ERROR IN A SURVEY Sample Responseset Frame population Target population

SOURCES OF ERROR IN A SURVEY Nonresponseset Sample Responseset Frame population Target population

Travel-Related Measures

A Note on Sample Size Recently found on the Web: For results based on this sample of 2,679 registered voters, the maximum margin of sampling error is ±2 percentage points.

A Note on Sample Size • Margin of Sampling Error = Radius of Confidence Interval for a Statistic from a Survey, usually referring to a 95% Confidence Interval. • Example: 95% Confidence Interval for Percentage Favoring Obama is 48% + 2%.

A Note on Sample Size • Why do travel volume estimates need large samples? • Answer: Relative Margin of Error matters. • If an estimated proportion is p and the margin of error is ME, the relative margin of error is: • RME = ME/p • In the example, RME = .02/.48 = 0.042, so the ME is about 4.2% of p.

A Note on Sample Size • Travel Example: From past studies, we know about 1% of households in continental U.S. (excluding MO) visit MO in any given month. If we want to estimate the percentage for a given month, we need a smaller confidence interval than 1% + 2%! In this case, RME = .02/.01 = 2, so the ME is 200% of the estimated percentage.

A Note on Sample Size How big of a RME can be tolerated? How big of a sample do you need to achieve it?

A Note on Stratification Data providers often use sampling designs based on stratification of demographic variables such as household income, region, education, etc. There are two issues the user might want to consider. • Even though the final weights balance the sample for non-response, increased variance due to non-response may be important. For example, consider a stratum calling for 10 households to be sampled, each representing 1000 households. If only 5 of the households respond to the survey, we’ll end up with 5 respondents, each having a weight of 2000. The net effect of this smaller sample with larger weights is that we are more likely to get estimates far away from the true value. There is not a bias problem, but there is an increase in variance.

A Note on Stratification Data providers often use sampling designs based on stratification of demographic variables such as household income, region, education, etc. There are two issues the user might want to consider. • The strata definitions may not coincide with the user’s needs. For example, the proportion of households that visit MO over a given time period is much higher for its neighboring states that for the non-neighboring states. If a sample over- or under-represents the neighboring states, we are likely to under- or over-estimate household visitation to MO.

Considerations • What is the Frame Population versus the Target Population? • Is the sample size adequate? (relative margin of error is key) • What is the response rate?

Considerations • Are the design (projection) weights reflective of the sampling unit? (e.g., beware of sample designs based on households with weights based on people) • Are the survey questions relative to the sampling unit? (e.g., beware of potential double counting from sampling at the household level but asking questions about travel parties)

Considerations • Is the sample balanced for demographic variables that will likely covary with the study variables? (e.g., household income, location, etc.)? • Do you have access to all of the data? (including weights, non-travelers, non-responders)

Thank You! Questions, Comments, Slides? www.teri.missouri.edu

CHALLENGES IN VISITOR-VOLUME ESTIMATION: OVERVIEW