1 / 44

Working with the data

Working with the data. Where to begin?. Have you come across any ACS data issues in your work?. Sample Error (90% Confidence) Collapsing Period Estimates Reliability Dollar Values Trend Analysis Weighing Change Light Rail Reweighting CTPP Issues Block Group data.

jihan
Download Presentation

Working with the data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Working with the data

  2. Where to begin? Have you come across any ACS data issues in your work? • Sample Error (90% Confidence) • Collapsing • Period Estimates • Reliability • Dollar Values • Trend Analysis • Weighing Change • Light Rail • Reweighting • CTPP Issues • Block Group data

  3. You must do Statistical Significance Tests Sampling Error To avoid false statements like Commutes increase for all modes “Based upon data from the 2000 Census (CTPP) and the 2005-2007 ACS, the total number workers who live in Flagstaff increased along with the number who tooktransit to work. During the same time, the number of people who worked at home increased along with those who drove alone and carpooled.”The World Gazette

  4. 1.Get the Margin of Error (MOE) from ACS 2. Calculate the Standard Error (SE) [SE = MOE / 1.645] 3. Solve for Z where A and B are the two estimates 4. If Z < -1.645 or Z > 1.645 Difference is Significant at 90% confidence How do you do a Significance Test? It is simpler than it looks and there are a lot guides

  5. Some things to keep in mind • Obtaining Standard Errors is the Key • Formulas vary depending comparisons • Sum or Difference of Estimates • Proportions and Percents • Means and Other Ratios • Working with 2000 data will be a little more involved There are resources to help

  6. A Compass for Understanding And Using ACS Data l Set of user-specific handbooks l Train-the trainer materials l E-learning ACS Tutorial l Annotated Presentations The ACS compass handbooks Especially Appendix 3 http://www.census.gov/acs/www/guidance_for_data_users/compass_products/

  7. NY State Data Center Calculator http://sdcclearinghouse.wordpress.com/2009/03/03/spreadsheet-to-calculate-acs-margins-of-error-and-statistical-significance-for-sums-proportions-and-ratios/

  8. But what if I am using 2000 non-ACS Data? You will need to Estimate the MOE and know the Survey Design Factor

  9. The CUTR Guide has you covered There’s a Report http://www.nctr.usf.edu/pdf/77802.pdf and a Spreadsheet Calculator http://www.nctr.usf.edu/spreadsheet/77802.xls http://www.nctr.usf.edu/abstracts/abs77802.htm

  10. Transportation resources http://onlinepubs.trb.org/onlinepubs/nchrp/nchrp_rpt_588.pdf

  11. Understanding the MOE Part 1, Profile 1 (Resident data) Using the MOE We know the number of workers has changed, but what is the range of that change? A. 5,744? B. 5,072 to 6,416? C. 3,888 to 7,600?

  12. Another Flagstaff point Part 1, Profile 1 (Resident data) Between the reference period what has the number of people who took transit to work in Flagstaff done? A. Gone Up? B. Gone Down? C. No significant Change Which Table would you use and why? Part 2, Profile 1 (Workplace data)

  13. Two types of Collapsing

  14. Collapsed table Full table not available Sometimes neither tables exist And MOEs are greater than estimate Population = 26,566

  15. “B” and “C” Tables B08006 C08006 Means of Transportation

  16. “B” and “C” Tables

  17. Full and collapsed table What do you notice about the Table?

  18. Some things to be aware of What year is the data? Period Estimate

  19. Reliability/Currency What data is more reliable? Which is more current?

  20. Dollar Values and Income tables ACS asks-- What was your income during the last 12 months? Single Year Estimates 12 different periods Each adjusted to single period (Jan to Dec) Multiyear Estimates Each year adjusted to current year

  21. About Trend Analysis Trend analysis (overlapping syndrome) If you are doing trend analysis with multi-year estimates you can not compare successive period estimates due to the overlapping middle years. Also, you can not compare a 3-year estimate with a 5-year estimate

  22. Change in Weighting In 2009 changed to using sub-county totals as opposed to just county totals

  23. Change in Weighting Detroit Example “Detroit is the poster child for odd looking data”

  24. Change in Weighting (Analysis) In 2009 changed to using sub-county totals as opposed to just county totals

  25. Light Rail Conundrum Impact of New “Light Rail” systems might not be showing up Source: 2000 CTPP and 2007ACS3, CTPP Data Profile 1

  26. One more thing on Pop Estimates The older estimates get revised every year but the ACS does not get reweighted

  27. Now let’s focus on the CTPP data But First a word on Disclosure - 3 year tables “Too many variables” crossed with Means of Transportation (Mode) DRB Said… …makes for micro data record …and with a micro data record you could identify an individual

  28. The Battle Ensued We Said… No, You can’t identify an individual -- Hired a statistical consultant < 0.01% -- Had a hearing with DRB Bosses -- Made every argument possible Census Said… Tough Luck --Compress your Modes and improve your chances of passing our rules -- Chop your cross tabs to 5 variables

  29. What we ended up with – for 3 year Tables Five (5) Variables crossed with Means of Transportation to work (MOT) …and

  30. A boat load of collapsing of the Modes …and

  31. Disclosure Rules Rule 7 was the killer • For Worker Flows • Must have 3 unweighted records for each O-D pair • Does not apply to Total Workers or Workers by Mode to Work (all 18 modes) • (means of transportation) For the 5-year CTPP

  32. So What Did We Do? NCHRP Web Report 180 ($550K) Producing Transportation Data Products from the ACS that Comply With Disclosure Rules 5-year CTPP will have two types of tables Tables that passed Census Rules Tables with Perturbation done to them Privacy Protection http://onlinepubs.trb.org/onlinepubs/nchrp/nchrp_w180.pdf

  33. Table Summary using 5-year Table list Tables Using Perturbed Data Set Means of transportation Aggregate Vehicles Used Aggregate Travel Time Mean HH Income Aggregate HH Income Aggregate Carpools Almost all Part 3 Tables

  34. Still left with some Disclosure Rules For All tables Regular (A) +Perturbed (B) • 1. All Tables Rounded • 0 = 0, 1-7 =4, 8 or > = nearest multiple of 5 • Any number that ends in 5 or 0 stays as is • Aggregate dollar values rounded to nearest 100 • Aggregate minutes to work and aggregate vehicles use standard rounding • Totals Rounded independently of cells • Medians or quintiles not subject to rounding • Percentages and rates calculated after rounding • 8. Medians and aggregates must be based on 3 or more values

  35. Still left with some Disclosure Rules For Regular (A) Tables Only • Cell Suppression: For Tables 101106 (unweighted sample count of the population), 101107 (percent of population in sample), 110101(total housing units sampled), and 110103 (percent of housing units sampled), there must be 0 or at least 3 or more occupied housing units in sample to show the table • Table Suppression: Aggregates and Means must have at least 3 unweighted cases to be shown. The policy of the ACS program is that if any one cell in a table is suppressed, the whole table is suppressed

  36. Ask Again Later Some early issues with the 5-year ACS? Standard Data Products Some Very Large MOEs Block Group data only in download area Reliability of tract estimates is much lower than the 2000 LF NO Workplace Tables! The Census Bureau says: BG data should ONLY be used to build up larger geographic areas because the Margins of Error (MOEs) are too large otherwise(JSM Conference August 2010) Ken Hodges, Nielsen (claritas)ACS 5-Year Data: A First Look at the First Release (4.5 MB, ppt) http://www.copafs.org/UserFiles/file/HodgesMarch2011.pptx

  37. Let’s talk about Block Group Data for a moment Source: Tract Data-Missouri State Data Center, Block Group Data-AFF AFF all 21 Modes, MSDC all 21 but also collapsed with Total Commuters Added MSDC put a value to MOES.

  38. First: Let’s consider MOEs What do you notice? Don’t forget if this was CTPP data it would be Rounded too

  39. Now lets fill in the table CB does not give you Total Commuters but you like that. Can we talk about that for a moment?

  40. Now lets fill in the table How would we get Total Commuters and more importantly the MOEs? For the Estimate totals, just add the relevant estimates. But for MOEs you have some decisions to make

  41. 488 Now lets fill in the table Two different MOE approaches available 1. Calculate the 90% margin of error of the sum of more than two estimates 2. Calculate the 90% margin of error of the sum or difference between two estimated values (What two values would you use?) 1. Gives you an MOE of either 245 when including the MOE for ‘Other Means’ or 214 without it 2 Gives you an MOE 0f 209

  42. What data should I use? Travel Times for the 6-counties in NE Illinois 1. To compare with 1970, ‘80, ‘90 and 2000 Travel Times? 2. To compare with my town of 52K people? 3. To validate my 2008 vintage travel demand model? Learn how to do the Coefficient of Variation Test

  43. The Upside - Data Evolution Once you know all the data issues it is possible to use it intelligently It’s ignorance that kills you

More Related