1 / 74

Ratio estimation under SRS

Ratio estimation under SRS. Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses “auxiliary” information ( X ) Sample data: observe y i and x i Population information

kioko
Download Presentation

Ratio estimation under SRS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ratio estimation under SRS • Assume • Absence of nonsampling error • SRS of size n from a pop of size N • Ratio estimation is alternative to under SRS, uses “auxiliary” information (X ) • Sample data: observe yi and xi • Population information • Have yi and xi on all individual units, or • Have summary statistics from the population distribution of X, such as population mean, total of X • Ratio estimation is also used to estimate population parameter called a ratio (B )

  2. Uses • Estimate a ratio • Tree volume or bushels per acre • Per capita income • Liability to asset ratio • More precise estimator of population parameters • If X and Y are correlated, can improve upon • Estimating totals when pop size N is unknown • Avoids need to know N in formula for • Domain estimation • Obtaining estimates of subsamples • Incorporate known information into estimates • Postratification • Adjust for nonresponse

  3. Estimating a ratio, B • Population parameter for the ratio: B • Examples • Number of bushels harvested (y) per acre (x) • Number of children (y) per single-parent household (x) • Total usable weight (y) relative to total shipment weight (x) for chickens

  4. Estimating a ratio • SRS of n observation units • Collect data on y and x for each OU • Natural estimator for B ?

  5. Estimating a ratio -2 • Estimator for B • is a biased estimator for B • is a ratio of random variables

  6. Bias of

  7. Bias of – 2 • Bias is small if • Sample size n is large • Sample fraction n/N is large • is large • is small (pop std deviation for x) • High positive correlation between X and Y (see Lohr p. 67)

  8. Estimated variance of estimator for B • Estimator for • If is unknown?

  9. Variance of • Variance is small if • sample size n is large • sample fraction n/N is large • deviations about line e = y  Bx are small • correlation between X and Y close to 1 • is large

  10. Ag example – 1 • Frame: 1987 Agricultural Census • Take SRS of 300 counties from 3078 counties to estimate conditions in 1992 • Collect data on y , have data on x for sample • Existing knowledge about the population

  11. Ag example – 2 • Estimate 0.9866 farm acres in 1992 relative to 1987 farm acres

  12. Ag example – 3 • Need to calculate variance of ei ’s

  13. Ag example – 4 • For each county i, calculate • Coffee Co, AL example • Sum of squares for ei

  14. Ag example – 5

  15. Estimating proportions • If denominator variable is random, use ratio estimator to estimate the proportion p • Example (p. 72) • 10 plots under protected oak trees used to assess effect of feral pigs on native vegetation on Santa Cruz Island, CA • Count live seedlings y and total number of seedlings x per plot • Y and X correlated due to common environmental factors • Estimate proportion of live seedlings to total number of seedlings

  16. Estimating population mean • Estimator for • “Adjustment factor” for sample mean • A measure of discrepancy between sample and population information, and • Improves precision if X and Y are + correlated

  17. y 0 x Underlying model with B > 0 • B is a slope • B > 0 indicates X and Y are positively correlated • Absence of intercept implies line must go through origin (0, 0) 0

  18. Using population mean of X to adjust sample mean • Discrepancy between sample & pop info for X is viewed as evidence that same relative discrepancy exists between

  19. Bias of • Ratio estimator for the population mean is biased • Rules of thumb for bias of apply

  20. Estimator for variance of • Estimator for variance of

  21. Ag example – 6

  22. Ag example - 8

  23. Ag example – 9 • Expect a linear relationship between X and Y (Figure 3.1) • Note that sample mean is not equal to population mean for X

  24. MSE under ratio estimation • Recall … • MSE = Variance + Bias2 • SRS estimators are unbiased so • MSE = Variance • Ratio estimators are biased so • MSE > Variance • Use MSE to compare design/estimation strategies • EX: compare sample mean under SRS with ratio estimator for pop mean under SRS

  25. Sample mean vs. ratio estimator of mean • is smaller than if and only if • For example, if and ratio estimation will be better than SRS

  26. Estimating the MSE • Estimate MSE with sample estimates of bias and variance of estimator • This tends to underestimate MSE • and are approximations • Estimated MSE is less biased if • is small (see earlier slide) • Large sample size or sampling fraction • High + correlation for X and Y • is a precise estimate (small CV for ) • We have a reasonably large sample size (n > 30)

  27. Ag example – 10

  28. Estimating population total t • Estimator for t • Is biased? • Estimator for

  29. Ag example – 11

  30. Summary of ratio estimation

  31. Summary of ratio estn – 2

  32. B1slope y B0 x Regression estimation • What if relationship between y and x is linear, but does NOT pass through the origin • Better model in this case is

  33. Regression estimation – 2 • New estimator is a regression estimator • To estimate , is predicted value from regression of y on x at • Adjustment factor for sample mean is linear, rather than multiplicative

  34. Estimating population mean • Regression estimator • Estimating regression parameters

  35. Estimating pop mean – 2 • Sample variances, correlation, covariance

  36. Bias in regression estimator

  37. Estimating variance • Note: This is a different residual than ratio estimation (predicted values differ)

  38. Estimating the MSE • Plugging sample estimates into Lohr, equation 3.13:

  39. Estimating population total t • Is regression estimator for t unbiased?

  40. Tree example • Goal: obtain a precise estimate of number of dead trees in an area • Sample • Select n = 25 out of N = 100 plots • Make field determination of number of dead trees per plot, yi • Population • For all N = 100 plots, have photo determination on number of dead trees per plot, xi • Calculate = 11.3 dead trees per plot

  41. Tree example – 2 • Lohr, p. 77-78 • Data • Plot of y vs. x • Output from PROC REG • Components for calculating estimators and estimating the variance of the estimators • We will use PROC SURVEYREG, which will give you the correct output for regression estimators

  42. Tree example – 3 • Estimated mean number of dead trees/plot • Estimated total number of dead trees

  43. Tree example – 4 • Due to small sample size, Lohr uses t -distribution w/ n 2 degrees of freedom • Half-width for 95% CI • Approx 95% CI for ty is (1115, 1283) dead trees

  44. B1slope y B0 x Related estimators • Ratio estimator • B0 = 0  ratio model • Ratio estimator  regression estimator with no intercept • Difference estimation • B1 = 1  slope is assumed to be 1

  45. Domain estimation under SRS • Usually interested in estimates and inferences for subpopulations, called domains • If we have not used stratification to set the sample size for each domain, then we should use domain estimation • We will assume SRS for this discussion • If we use stratified sampling with strata = domains, then use stratum estimators (Ch 4) • To use stratification, need to know domain assignment for each unit in the sampling frame prior to sampling

  46. Stratification vs. domain estimation • In stratified random sampling • Define sample size in each stratum before collecting data • Sample size in stratum h is fixed, or known • In other words, the sample size nhis the same for each sample selected under the specified design • In domain estimation • nd = sample size in domain d is random • Don’t know nd until after the data have been collected • The value of nd changes from sample to sample

  47. d=1 d=2 . . . . . . d=D Domain #1 Population partitioned into domains Domain D • Recall U = index set for population = {1, 2, …, N } • Domain index set for domain d = 1, 2, …, D • Ud = {1, 2, …, Nd } where Nd= number of OUs in domain d in the population • In sample of size n • nd = number of sample units from domain d are in the sample • Sd = index set for sample belonging to domain d

  48. Boat owner example • Population • N = 400,000 boat owners (currently licensed) • Sample • n = 1,500 owners selected using SRS • Divide universe (population) into 2 domains • d = 1 own open motor boat > 16 ft. (large boat) • d = 2 do not own this type of boat • Of the n = 1500 sample owners: • n1 = 472 owners of open motor boat > 16 ft. • n2 = 1028 owners do not own this kind of boat

  49. New population parameters • Domain mean • Domain total

  50. Boat owner example - 2 • Estimate population domain mean • Estimate the average number of children for boat owners from domain 1 • Estimate proportion of boat owners from domain 1 who have children • Estimate population domain total • Estimate the total number of children for large boat owners (domain 1)

More Related