1 / 35

Some ACS Data Issues and Statistical Significance (MOEs)

Some ACS Data Issues and Statistical Significance (MOEs). Table Release Rules Statistical Filtering & Collapsing Disclosure Review Board Statistical Significance Testing & Margins of Error (MOEs). Table Release Rules. February 28, 2007. “B” and “C” Tables. Full Table – PASSED FILTERING.

jacqui
Download Presentation

Some ACS Data Issues and Statistical Significance (MOEs)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some ACS Data Issues and Statistical Significance (MOEs) Table Release Rules Statistical Filtering & Collapsing Disclosure Review Board Statistical Significance Testing & Margins of Error (MOEs)

  2. Table Release Rules February 28, 2007

  3. “B” and “C” Tables

  4. Full Table – PASSED FILTERING Statistically too Small

  5. Collapsed Table

  6. Why did we collect all this data if we were not going to release it? • The Census Bureau Story

  7. ACS Data Release Rules • Doug Hillmer • Data Products Area • American Community Survey Office • U.S. Census Bureau • October 11, 2006

  8. The Census Bureau Will Not Release All Available Estimates to the Public • Limitation of Disclosure Risk • The Census Bureau’s Disclosure Review Board (DRB) must clear all data products prior to their release to the public. • Assurance of Statistical Reliability • Data users need to be able to use ACS estimates as official Census Bureau data. Thus, some rules must be in place to ensure minimum reliability of estimates. • Statistical reliability is assured by: • Population size thresholds below which estimates are not released • Data release testing and collapsing of tables that fail

  9. The ACS “Identity Crisis” on Reliability • Ultimately, the 5-year estimates, with no “data release rules” acts as a long-form replacement • Single-year ACS sample is more like a current demographic survey – although much larger in size • Question to answer for single-year estimates: Do we accept less detail in our measures of characteristics or do we allow more detail but with data release rules in place? Less detail punishes those areas with the diversity to support the detail.

  10. Choices for displaying estimatesin ACS data products • No suppression • Publish full detail with no suppression but higher pop threshold (eg., 500,000) • Publish limited set of estimates for all areas with 65,000+ pop • Published more detailed estimates for higher pop threshold and limited set for lower threshold • With suppression or Warnings • Define a very detailed set of estimates for all geo areas with 65,000+ pop and suppress estimates that fail reliability test • Define a very detailed set of estimates for all geo areas with 65,000+ pop and flag estimates that fail reliability test

  11. Filtering <<Data Release Rules >> • Goal: to identify “weak” tables • Some tables have many zero or “near zero” cells and relatively large standard errors • Filtering <<Data Release>> rule used during 2000-2004 ACS: drop tables if… • Universe is less than 500 (weighted) • Average cell size is less than 2 cases (unweighted) • filtering <<data release>> rule used now: • Accept if median coefficient of variation is less than or equal to 61% • Otherwise, collapse and review again

  12. Why not just use cell suppression as is done for the Economic products? • Advantages • Gets rid of the “bad” estimates • Keeps the “good” estimates (depends on complementary suppression) • Disadvantages • Creates “holes” in distributions • Makes new problems for combined estimates (eg., in derived products, such as data profiles) • Produces a new set of problems for year-to-year comparisons

  13. Data Release Testing – Step by Step • Compute coefficients of variation • Coefficient of variation = standard error / estimate • Standard error = (upper bound – estimate) / 1.65 • If the estimate = 0 set coefficient of variation = 100% • Ignore total and sub-total lines in base table • Sort coefficients of variation in descending order • Find the middle value (the median) • If the median is greater than 61% the table FAILS (median > 61% means more than half of the cells have a lower bound of 0; i.e., these cells are not statistically different from 0) • If the median is 61% or less the table PASSES

  14. Collapsing • Goal: release a simplified version of a base table for a geographic area that otherwise would get nothing • Decisions on design of collapsed tables are made by subject-matter experts at the Census Bureau • For operational reasons, only one collapsed version of each base table will be available regardless of geographic area

  15. How the Data Release Rules will Work with Collapsed Versions of Base Tables

  16. More About Collapsing • Collapsed Tables are designed to assure that derived products (profiles, ranking tables, subject tables,…) can still be sourced from the base tables • 2005 Tables: if a table passes filtering and a collapsed version exists, publish both the original version and the collapsed version for that geographic area

  17. Problems to fix in the current implementation of the data release rules • Collapsed versions missing in some cases • Collapsed versions that aren’t working • Poor choices in “sourcing” for derived products (eg., profiles)

  18. Statistical Significance Testing Why should I do it? When should I do it? How do I do it?

  19. Testing is Important

  20. Statements you might want to make • Estimate X is bigger than Y • Estimate X this year is larger than X last year • Estimate X is smaller than Census 2000 value • State Z has the highest value

  21. How do I do a significance test? 1.Get the Margin of Error (MOE) from ACS 2. Calculate the Standard Error (SE) [SE = MOE / 1.645] 3. Solve for Z where A and B are the two estimates 4. If Z < -1.645 or Z > 1.645 Difference is Significant at 90% confidence

  22. Obtaining Standard Errors is the Key • Sum or Difference of Estimates • Proportions and Percents • Means and Other Ratios Simple Formulas Where….

  23. There is HELP off in the wings

  24. But what if I am using 2000 non-ACS Data? Where’s are my MOEs?

  25. Lets get to work on the Standard Error Survey Design Factor X N = Size of publication area (population) Y = Estimate of characteristic

  26. Survey Design Factor Mode to Work 1.4 1.2 0.9 0.7 www.census.gov/prod/cen2000/doc/tablec-xx.pdf xx=fl

  27. 5Y = 5* 126,540 632,700 1 - (Y/N) = 126,540 / 362,563 1- 0.3490152 0.6509848 N = Size of publication area (population = 362,563) Y = Estimate of characteristic SE = 641.7772

  28. Survey Design Factor X SE = 641.777 126,540 / 362,563 = 35% Survey Design Factor Final Adjusted SE = 450 = 0.7

  29. Tempting Green is OK This is NOT

  30. Want to do an exercise on your own?

More Related