150 likes | 271 Views
Making Sense of Census Data. Robert Matthews University of Alabama at Birmingham. Introduction. Our cohort consists of a 5% sample of the entire U.S. Medicare population from 1999-2006 Zip+4 (9-digit) information available for 99.9% of beneficiaries and providers
E N D
Making Sense of Census Data Robert Matthews University of Alabama at Birmingham
Introduction • Our cohort consists of a 5% sample of the entire U.S. Medicare population from 1999-2006 • Zip+4 (9-digit) information available for 99.9% of beneficiaries and providers • Task was to link our cohort with the census data to obtain demographic variables:Educational attainment, median household income, and total population
Hierarchical Relationships of Census Geographic Structures Source: U.S. Census Bureau, Summary file 3 documentation
Summary Files • “Short Form” • Summary File 1 (SF 1) – data from the Short Form questions • Summary File 2 (SF 2) – data from the Short Form questions, repeated for 249 population groups • Redistricting Data – used for congressional and state redistricting
Summary Files • “Long Form”Only asked for a sample of the U.S. population (1/6 households) • Summary File 3 (SF 3) – comprehensive results from the Long Form • Summary File 4 (SF 4) – comprehensive results from the Long Form, repeated for 335 population groups
Summary File 3 components • 53 sets of files • 50 U.S. States • District of Columbia (D.C.) • Puerto Rico • All states combined • 53 x 77 = 3,927 files
Linking Census and Medicare data • Census Block Group is used to link Census and Medicare data • Census block is a 4-character variable and the block group is identified by the value in the first position • We obtained a database containing 66 million Zip+4 zipcodes from Melissa Data so that we could get the census tract and block for each zip
Variables of interest • Variable description • Educational attainment (Table P37) • Median household income (Table P53) • Total population (Table P1) • Tables mapped to File Segmentation Table • P37 File 04 • P53 File 06 • P1 File 01
Summary File 3 components • Geographic Identifier file (GeoID) • 76 data files containing different sets of variables • GeoID file is linked to each of the 76 data files by a variable named LogRecNo • The Summary Level must be selected from the GeoID file to extract the desired stratification level. This is used to identify the specific area being tabulated.
Subject Locator • Index designed to quickly identify tables in the summary file for particular subjects or topics of interest. • Arranged alphabetically by name of subject • Each row contains the type of entry and the relevant table number for the data source
Summary of steps for identifying variables and merging with cohort • Use Subject Locator to identify variables of interest and their corresponding table numbers • Use File Segmentation Table to identify specific data file(s) for each table number • Use the Summary Level Sequence Chart to locate the desired stratification level • Identify SAS input statements to read each file • Merge census variables with existing cohort data
Conclusion • Daunting task due to large volume of Census data and documentation • Well organized into a manageable set of distinct components • Flexibility comes at cost of thousands of variables and data files • Process of extracting variables from Census data becomes much easier once all the pieces are in place
Contact information Robert Matthews University of Alabama at Birmingham Department of Epidemiology 1665 University Blvd. RPHB 517 Birmingham, AL 35294-0022 Email: rsm@uab.edu Web: www.epi.soph.uab.edu/rsm/