1 / 36

Continuous Surveys: Statistical Challenges and Opportunities

Continuous Surveys: Statistical Challenges and Opportunities. Carl Schmertmann Center for Demography & Population Health Florida State University schmertmann@fsu.edu. Outline. CHALLENGES (long) Increased Temporal Complexity Increased Sampling Error New Weighting Problems

lavina
Download Presentation

Continuous Surveys: Statistical Challenges and Opportunities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Surveys: Statistical Challenges and Opportunities Carl Schmertmann Center for Demography & Population Health Florida State University schmertmann@fsu.edu

  2. Outline • CHALLENGES (long) • Increased Temporal Complexity • Increased Sampling Error • New Weighting Problems • OPPORTUNITIES (brief, but important)

  3. Sample Size Comparison • US CENSUS LONG FORM:--- 17% / decade • ACS ROLLING SURVEY: 2 per 1000 Households / month 24 per 1000 Households / year 240 per 1000 Households / decade--- 24% / decade

  4. Sampling Differences over Decade

  5. 1. Temporal Complexity 1. Temporal Complexity

  6. What is the Population? • 1-Day Census • Population membership is binary: {0,1} • Each individual is IN or OUT • Continuous Survey • Population membership is fuzzy:0 --------------- + ---------------1 • Individuals can be MORE IN (more person-days of residence) or MORE OUT (fewer) 1. Temporal Complexity

  7. Residents (in 000s) 1. Temporal Complexity

  8. Residents (in 000s) Census Population = 12 000 (83% Type A) 1. Temporal Complexity

  9. Residents (in 000s) An ACS ‘Data Sandwich’ includes samples from all months 1. Temporal Complexity

  10. Residents (in 000s) ACS samples from 184 000 person-months Avg Population: 15 333 (65% Type A) 1. Temporal Complexity

  11. Characteristics change over the Sampling Period • Persons • Age • Marital Status • Employment • Education • Housing Units • Vacancy • Number of Occupants • $ Value 1. Temporal Complexity

  12. Rolling ‘Population’ Population formed by sandwiching monthly samples is the average frame of a film, not a snapshot Individuals and housing units with changing characteristics are sampled and caught ‘in motion’. 1. Temporal Complexity

  13. Reference Period Problems Many ‘long-form’ questions refer to retrospective periods: • Income in last 12 months • Place of residence 1 year ago • Child born in last 12 months? • Etc. 1. Temporal Complexity

  14. Time Reference Example • ‘2004’ data from 12 monthly samples taken in Jan04…Dec04 • Question on fertility in the 12 months prior to the survey, so there are 12 overlapping periods in ‘2004’ data • ‘Jan04’ question covers Jan03-Jan04 • ‘Feb04’ question covers Feb03-Feb04 • etc. 1. Temporal Complexity

  15. Nov 2004 Oct 2004 Sep 2004 Dec 2004 Mar 2004 Aug 2004 Apr 2004 May 2004 Jul 2004 Jun 2004 . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . . . . . x x x x x x x x x x x x ● . . . x x x x x x x x x x x x ● . . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . x x x x x x x x x x x x ● . . Jan 03 Jan 04 Jan 05 Jan 2004 x x x x x x x x x x x x ● . . . . . . . . . . . Feb 2004 . x x x x x x x x x x x x ● . . . . . . . . . . 1 7 11 12 11 10 8 1 6 9 10 2 3 4 9 8 5 7 6 5 4 3 2 1. Temporal Complexity

  16. Reference Periods for ‘Last 12 Month’ Questions in 1-year ACS Datasets 1. Temporal Complexity

  17. Temporal Issues Summarized ‘Data Sandwiches’ contain: • New meaning of ‘population’ • Units that change over sampling period (moving targets) • Multiple reference periods for retrospective questions 1. Temporal Complexity

  18. 2. Sampling Error 2. Sampling Error

  19. Small Samples More overall data from continuous sampling, but…1-, 3-, or 5-Year Sandwiches have smaller samples than the single, decennial long form survey more sampling error in published data 2. Sampling Error

  20. Small Samples The problem is especially acute for • small areas • narrow age groups • rare subpopulations e.g., How many unmarried teen births per year in Sevier County, Tennessee? ACS 2006-2008 says 0 ± 161 2. Sampling Error

  21. 2. Sampling Error

  22. C24020. SEX BY OCCUPATION – Key West, Florida Data Set: 2006-2008 American Community Survey 3-Year Estimates(http://tinyurl.com/acs-alap) …etc 2. Sampling Error

  23. Temporal Instability Teenage Birth Rate in a County

  24. Unfortunate Result Aggregating over 1+ years of surveys produces datasets that are often • Unfamiliar and difficult to understand • Still too noisy to be useful for planners and researchers 2. Sampling Error

  25. 3. Weighting for Non-Response 3. Weighting Problems

  26. Weighting Weighting from Respondents  Total Population requires Population Control Totals: (Place x Age x Sex x Race x Ethnicity x …) 3. Weighting Problems

  27. Decennial Long Form Sample • Control Totals • Measured from a simultaneous enumeration of the population(Sample & Census on same day) • Only 1 set needed • Sample and Population defined identically (resid. on Census Day) 3. Weighting Problems

  28. Continuous Survey • Control Totals • Must be estimated (no simultaneous census) • Many sets needed (2006, 2007, 2006-8, 2007-9, 2008-12, …) • Sample and Population defined differently 3. Weighting Problems

  29. ACS Control Totals (Persons) • ACS responses are weighted to match official intercensal estimates by • Year (1 July midpoint snapshot) • County (sometimes city) • Age • Race • Sex • Hispanic Origin (yes/no) 3. Weighting Problems

  30. ACS Control Totals (Persons) Potential Errors • Estimates are Wrong: • Unanticipated internal migration • Unanticipated international migration • etc • Population Definition don’t match • Seasonal fluctuations • Different race/ethnic categories 3. Weighting Problems

  31. Census Pop = 12 000 (83% Type A) Average Pop = 15 333 (65% Type A) If every year looks like this…Intercensal Estim= 12 000 (83% Type A) 3. Weighting Problems

  32. Weighting Error Example ACS weighting to estimates produces: • Popn too small (Census < Avg Pop) • Popn too “A” (seasonal Bs missed) • Overestimates of vars + correl. with A (e.g., % with college education) • Underestimates of vars - correl. with A (e.g., % single-parent families) 3. Weighting Problems

  33. Opportunities 4. Opportunities

  34. Opportunities ACS table cells = millions of “seemingly unrelated” maximum likelihood estimates Statistical models that exploit likely cell relationships (over times, ages, sexes, places, variables …) could, in principle • Retain frequency & recency • Reduce variance of estimates • Recover familiar measures 4. Opportunities

  35. Conclusion CONTINUOUS SURVEYS like ACS create • Big Problems for producers and users • Unfamiliar, temporally complex data • Potentially high sample error • Technical problems with weighting • Big Opportunities, IF we can develop appropriate statistical models and practices 5. Conclusion

  36. Thanks! ¡Gracias! Obrigado! 5. Conclusion

More Related