Some ACS Data Issues and Statistical Significance (MOEs)

Some ACS Data Issues and Statistical Significance (MOEs) Table Release Rules Statistical Filtering & Collapsing Disclosure Review Board Statistical Significance Testing & Margins of Error (MOEs)

Table Release Rules February 28, 2007

“B” and “C” Tables

Full Table – PASSED FILTERING Statistically too Small

Collapsed Table

Why did we collect all this data if we were not going to release it? • The Census Bureau Story

ACS Data Release Rules • Doug Hillmer • Data Products Area • American Community Survey Office • U.S. Census Bureau • October 11, 2006

The Census Bureau Will Not Release All Available Estimates to the Public • Limitation of Disclosure Risk • The Census Bureau’s Disclosure Review Board (DRB) must clear all data products prior to their release to the public. • Assurance of Statistical Reliability • Data users need to be able to use ACS estimates as official Census Bureau data. Thus, some rules must be in place to ensure minimum reliability of estimates. • Statistical reliability is assured by: • Population size thresholds below which estimates are not released • Data release testing and collapsing of tables that fail

The ACS “Identity Crisis” on Reliability • Ultimately, the 5-year estimates, with no “data release rules” acts as a long-form replacement • Single-year ACS sample is more like a current demographic survey – although much larger in size • Question to answer for single-year estimates: Do we accept less detail in our measures of characteristics or do we allow more detail but with data release rules in place? Less detail punishes those areas with the diversity to support the detail.

Choices for displaying estimatesin ACS data products • No suppression • Publish full detail with no suppression but higher pop threshold (eg., 500,000) • Publish limited set of estimates for all areas with 65,000+ pop • Published more detailed estimates for higher pop threshold and limited set for lower threshold • With suppression or Warnings • Define a very detailed set of estimates for all geo areas with 65,000+ pop and suppress estimates that fail reliability test • Define a very detailed set of estimates for all geo areas with 65,000+ pop and flag estimates that fail reliability test

Filtering <<Data Release Rules >> • Goal: to identify “weak” tables • Some tables have many zero or “near zero” cells and relatively large standard errors • Filtering <<Data Release>> rule used during 2000-2004 ACS: drop tables if… • Universe is less than 500 (weighted) • Average cell size is less than 2 cases (unweighted) • filtering <<data release>> rule used now: • Accept if median coefficient of variation is less than or equal to 61% • Otherwise, collapse and review again

Why not just use cell suppression as is done for the Economic products? • Advantages • Gets rid of the “bad” estimates • Keeps the “good” estimates (depends on complementary suppression) • Disadvantages • Creates “holes” in distributions • Makes new problems for combined estimates (eg., in derived products, such as data profiles) • Produces a new set of problems for year-to-year comparisons

Data Release Testing – Step by Step • Compute coefficients of variation • Coefficient of variation = standard error / estimate • Standard error = (upper bound – estimate) / 1.65 • If the estimate = 0 set coefficient of variation = 100% • Ignore total and sub-total lines in base table • Sort coefficients of variation in descending order • Find the middle value (the median) • If the median is greater than 61% the table FAILS (median > 61% means more than half of the cells have a lower bound of 0; i.e., these cells are not statistically different from 0) • If the median is 61% or less the table PASSES

Collapsing • Goal: release a simplified version of a base table for a geographic area that otherwise would get nothing • Decisions on design of collapsed tables are made by subject-matter experts at the Census Bureau • For operational reasons, only one collapsed version of each base table will be available regardless of geographic area

How the Data Release Rules will Work with Collapsed Versions of Base Tables

More About Collapsing • Collapsed Tables are designed to assure that derived products (profiles, ranking tables, subject tables,…) can still be sourced from the base tables • 2005 Tables: if a table passes filtering and a collapsed version exists, publish both the original version and the collapsed version for that geographic area

Problems to fix in the current implementation of the data release rules • Collapsed versions missing in some cases • Collapsed versions that aren’t working • Poor choices in “sourcing” for derived products (eg., profiles)

Statistical Significance Testing Why should I do it? When should I do it? How do I do it?

Testing is Important

Statements you might want to make • Estimate X is bigger than Y • Estimate X this year is larger than X last year • Estimate X is smaller than Census 2000 value • State Z has the highest value

How do I do a significance test? 1.Get the Margin of Error (MOE) from ACS 2. Calculate the Standard Error (SE) [SE = MOE / 1.645] 3. Solve for Z where A and B are the two estimates 4. If Z < -1.645 or Z > 1.645 Difference is Significant at 90% confidence

Obtaining Standard Errors is the Key • Sum or Difference of Estimates • Proportions and Percents • Means and Other Ratios Simple Formulas Where….

There is HELP off in the wings

But what if I am using 2000 non-ACS Data? Where’s are my MOEs?

Lets get to work on the Standard Error Survey Design Factor X N = Size of publication area (population) Y = Estimate of characteristic

Survey Design Factor Mode to Work 1.4 1.2 0.9 0.7 www.census.gov/prod/cen2000/doc/tablec-xx.pdf xx=fl

5Y = 5* 126,540 632,700 1 - (Y/N) = 126,540 / 362,563 1- 0.3490152 0.6509848 N = Size of publication area (population = 362,563) Y = Estimate of characteristic SE = 641.7772

Survey Design Factor X SE = 641.777 126,540 / 362,563 = 35% Survey Design Factor Final Adjusted SE = 450 = 0.7

Tempting Green is OK This is NOT

Want to do an exercise on your own?

Some ACS Data Issues and Statistical Significance (MOEs)

Some ACS Data Issues and Statistical Significance (MOEs)

Presentation Transcript

Statistical Significance and Population Controls

Statistical Significance Testing

Some Statistical Issues in Microarray Data Analysis

ACS Statistical Issues and Challenges: One-, Three-, and Five-year Period Estimates

Cancer Stem Cells: Some statistical issues

Statistical Significance

Determining Statistical Significance

Statistical Significance

Statistical Significance and Performance Measures

ACS Technical Issues

Statistical Significance

Statistical Significance

ACS Statistical Issues and Challenges: One-, Three-, and Five-year Period Estimates

International Statistical Data: Trends, Sources and Issues

Hypothesis Testing and Statistical Significance

Statistical significance

Statistical Significance and  (alpha) level

Statistical Significance Testing

Statistical Significance

ACS Statistical Issues and Challenges: One-, Three-, and Five-year Period Estimates

Computational and Statistical Issues in Data-Mining

Statistical Significance Testing