1 / 15

Making Sense of Census Data

Making Sense of Census Data. Robert Matthews University of Alabama at Birmingham. Introduction. Our cohort consists of a 5% sample of the entire U.S. Medicare population from 1999-2006 Zip+4 (9-digit) information available for 99.9% of beneficiaries and providers

sorcha
Download Presentation

Making Sense of Census Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Sense of Census Data Robert Matthews University of Alabama at Birmingham

  2. Introduction • Our cohort consists of a 5% sample of the entire U.S. Medicare population from 1999-2006 • Zip+4 (9-digit) information available for 99.9% of beneficiaries and providers • Task was to link our cohort with the census data to obtain demographic variables:Educational attainment, median household income, and total population

  3. Hierarchical Relationships of Census Geographic Structures Source: U.S. Census Bureau, Summary file 3 documentation

  4. Summary Files • “Short Form” • Summary File 1 (SF 1) – data from the Short Form questions • Summary File 2 (SF 2) – data from the Short Form questions, repeated for 249 population groups • Redistricting Data – used for congressional and state redistricting

  5. Summary Files • “Long Form”Only asked for a sample of the U.S. population (1/6 households) • Summary File 3 (SF 3) – comprehensive results from the Long Form • Summary File 4 (SF 4) – comprehensive results from the Long Form, repeated for 335 population groups

  6. Summary File 3 components • 53 sets of files • 50 U.S. States • District of Columbia (D.C.) • Puerto Rico • All states combined • 53 x 77 = 3,927 files

  7. Linking Census and Medicare data • Census Block Group is used to link Census and Medicare data • Census block is a 4-character variable and the block group is identified by the value in the first position • We obtained a database containing 66 million Zip+4 zipcodes from Melissa Data so that we could get the census tract and block for each zip

  8. Variables of interest • Variable description • Educational attainment (Table P37) • Median household income (Table P53) • Total population (Table P1) • Tables mapped to File Segmentation Table • P37  File 04 • P53  File 06 • P1  File 01

  9. Summary File 3 components • Geographic Identifier file (GeoID) • 76 data files containing different sets of variables • GeoID file is linked to each of the 76 data files by a variable named LogRecNo • The Summary Level must be selected from the GeoID file to extract the desired stratification level. This is used to identify the specific area being tabulated.

  10. Summary Level Sequence Chart (partial listing)

  11. Subject Locator • Index designed to quickly identify tables in the summary file for particular subjects or topics of interest. • Arranged alphabetically by name of subject • Each row contains the type of entry and the relevant table number for the data source

  12. Subject Locator Index (partial listing)

  13. Summary of steps for identifying variables and merging with cohort • Use Subject Locator to identify variables of interest and their corresponding table numbers • Use File Segmentation Table to identify specific data file(s) for each table number • Use the Summary Level Sequence Chart to locate the desired stratification level • Identify SAS input statements to read each file • Merge census variables with existing cohort data

  14. Conclusion • Daunting task due to large volume of Census data and documentation • Well organized into a manageable set of distinct components • Flexibility comes at cost of thousands of variables and data files • Process of extracting variables from Census data becomes much easier once all the pieces are in place

  15. Contact information Robert Matthews University of Alabama at Birmingham Department of Epidemiology 1665 University Blvd. RPHB 517 Birmingham, AL 35294-0022 Email: rsm@uab.edu Web: www.epi.soph.uab.edu/rsm/

More Related