420 likes | 571 Views
2014 SDC and CIC Annual Training Conference: Accessing ACS PUMS Data Tim Gilbert U.S. Census Bureau April 2, 2014. Outline. Fundamentals of PUMS Data Geography and the PUMS Accessing PUMS Data Multiple Vintage Variables in PUMS Working with multiple vintage variables
E N D
2014 SDC and CIC Annual Training Conference:Accessing ACS PUMS DataTim GilbertU.S. Census BureauApril 2, 2014
Outline • Fundamentals of PUMS Data • Geography and the PUMS • Accessing PUMS Data • Multiple Vintage Variables in PUMS • Working with multiple vintage variables • Considerations with PUMS • Resources
Summary Data and MicrodataWhat’s the Difference? Summary data are predefined tables for specific geographic areas (states, counties, etc.) In the ACS microdata, the basic unit is an individual housing unit or person
What are PUMS data? anonymized, downloadable Public Use records of individual people Microdata a representative sample of the population Sample
PUMS Overview PUMS sample is a subsample of ACS interviews, one percent of all US households PUMS is a “weighted” sample Weighting variables must be used in analysis A set of two files - housing units and persons ACS produces 1-, 3-, and 5-year PUMS files Available as SAS files and CSV files, or via DataFerrett
Why Use PUMS? Data needed for a tabulation or a specific universe not supported by standard ACS tables (e.g., population groups by single year of age) Statistical analysis required to understand relationships between economic, demographic or housing variables (e.g., correlation analysis) Can create new measures using multiple variables or other people in household (spouse’s occupation, same-sex couples, number of kids)
Types of PUMS Files Released • We release 3 new PUMS files every year • 1 year PUMS (example: 2012 1-year PUMS) • December 2013 • 3-year PUMS (example: 2010-2012 3-year PUMS) • February 2014 • 5-year PUMS (example: 2008-2012 5-year PUMS) • March 2014
Modifications to Multiyear PUMS • Multiyear PUMS have the same cases and geography as their component 1-year files • How are multiyear PUMS different from single year? • Weights are produced using latest population estimate “vintages” • Dollar amounts are standardized • Why use the multiyear PUMS files? • For studying small groups, where more cases are needed • When analysis is also making use of multiyear summary data
Outline • Fundamentals of PUMS Data • Geography and the PUMS • Accessing PUMS Data • Multiple Vintage Variables in PUMS • Working with multiple vintage variables • Considerations with PUMS • Resources
Limited Geographic Detail • Geographic identifiers are region, division, state, Public Use Microdata Area (PUMA) • PUMAs can be used to identify geographic areas of 100,000+ • PUMS is not designed for statistical analysis of small geographic areas
Public Use Microdata Area (PUMA) Defined after each census by the states in coordination with the Census Bureau’s Geography Division http://www.census.gov/geo/puma/puma2010.html Redefined PUMAs for 2012 PUMS files DY 2012 multiyear files have dual PUMA vintages Large enough to meet disclosure avoidance requirements PUMAs are identified by a five-digit number, unique within each state
PUMA Reference Maps http://www.census.gov/geo/maps-data/maps/reference.html
Interactive PUMA Maps http://tigerweb.geo.census.gov/tigerwebmain/tigerweb_main.html
Outline • Fundamentals of PUMS Data • Geography and the PUMS • Accessing PUMS Data • Multiple Vintage Variables in PUMS • Working with multiple vintage variables • Considerations with PUMS • Resources
American FactFinder http://www.census.gov/acs/www/data_documentation/pums_data/
PUMS on FTP site www2.census.gov
PUMS on FTP site www2.census.gov
PUMS on FTP site www2.census.gov
DataFerrett http://dataferrett.census.gov/
Outline • Fundamentals of PUMS Data • Geography and the PUMS • Accessing PUMS Data • Multiple Vintage Variables in PUMS • Working with multiple vintage variables • Considerations with PUMS • Resources
What are multiple vintage variables? • 2010-2012 3-Year PUMS and 2008-2012 5-Year PUMS contain variables with multiple vintages • Multiple vintage variables have differing sets of values for different years within the same multi-year file
Outline • Fundamentals of PUMS Data • Geography and the PUMS • Accessing PUMS Data • Multiple Vintage Variables in PUMS • Working with multiple vintage variables • Considerations with PUMS • Resources
PUMS Documentation http://www.census.gov/acs/www/data_documentation/pums_documentation/ • PUMS ReadMe • List of variables with multiple vintages • PUMS Data Dictionary • Variable names, descriptions, and values • Accuracy of the PUMS • Information about working with multiple vintage variables
Using Multiple Vintage Variables • Verify variable has multiple vintages from PUMS ReadMe – Marital History
Using Multiple Vintage Variables • Look up differences between vintages in data dictionary
Using Multiple Vintage Variables • Recode and combine vintages to create one variable If MARHYP05 less than or equal to 1932 OR If MARHYP12 equals 1932 THEN MARHYP (derived variable) equals 1932
Using Multiple Vintage PUMAs • Look up PUMA variable vintages in data dictionary
Using Multiple Vintage PUMAs • Look up PUMA in Missouri State Data Center’s MABLE/Geocorr12 at http://mcdc.missouri.edu/websas/geocorr12.html
Using Multiple Vintage PUMAs • Find corresponding PUMAs across vintages
Using Multiple Vintage PUMAs • Combine the PUMA vintages across years If ST equals 26 and PUMA00 equals 03806 OR If ST equals 26 and PUMA10 equals 03204 THEN PUMA (derived variable) equals XXXXX
Outline • Fundamentals of PUMS Data • Geography and the PUMS • Accessing PUMS Data • Multiple Vintage Variables in PUMS • Working with multiple vintage variables • Considerations with PUMS • Resources
Analyzing PUMS Data • National level files must be concatenated • See PUMS ReadMe • Use SERIALNO variable to merge housing and person records to create complete file • See PUMS ReadMe http://www.census.gov/acs/www/data_documentation/pums_documentation/
Types of PUMS Weights PUMS household weights (wgtp) must be used to produce housing unit estimates PUMS person weights (pwgtp) must be used to produce population estimates PUMS replicate weights (wgtp1 – wgtp80 and pwgtp1 – pwgtp80) are used for calculating standard errors
Estimating Variance with PUMS • Problem: PUMS is not a simple random sample • Stratified samples with complex weighting • Sample drawn at household level (i.e., not a simple random sample of individuals) • Solutions: • Use weighting variable and a “design factor” • Use weighting variable and 80 “replicate weights” • See Accuracy of the PUMS http://www.census.gov/acs/www/data_documentation/pums_documentation/
Outline • Fundamentals of PUMS Data • Geography and the PUMS • Accessing PUMS Data • Multiple Vintage Variables in PUMS • Working with multiple vintage variables • Considerations with PUMS • Resources
http://www.census.gov/acs/www/data_documentation/pums_documentation/http://www.census.gov/acs/www/data_documentation/pums_documentation/
ContactInformation ACS/PRCS website: www.census.gov/acs ACS User Support: 301-763-1405 acso.users.support@census.gov Questions?