Making Sense of Census Data

Making Sense of Census Data Robert Matthews University of Alabama at Birmingham

Introduction • Our cohort consists of a 5% sample of the entire U.S. Medicare population from 1999-2006 • Zip+4 (9-digit) information available for 99.9% of beneficiaries and providers • Task was to link our cohort with the census data to obtain demographic variables:Educational attainment, median household income, and total population

Hierarchical Relationships of Census Geographic Structures Source: U.S. Census Bureau, Summary file 3 documentation

Summary Files • “Short Form” • Summary File 1 (SF 1) – data from the Short Form questions • Summary File 2 (SF 2) – data from the Short Form questions, repeated for 249 population groups • Redistricting Data – used for congressional and state redistricting

Summary Files • “Long Form”Only asked for a sample of the U.S. population (1/6 households) • Summary File 3 (SF 3) – comprehensive results from the Long Form • Summary File 4 (SF 4) – comprehensive results from the Long Form, repeated for 335 population groups

Summary File 3 components • 53 sets of files • 50 U.S. States • District of Columbia (D.C.) • Puerto Rico • All states combined • 53 x 77 = 3,927 files

Linking Census and Medicare data • Census Block Group is used to link Census and Medicare data • Census block is a 4-character variable and the block group is identified by the value in the first position • We obtained a database containing 66 million Zip+4 zipcodes from Melissa Data so that we could get the census tract and block for each zip

Variables of interest • Variable description • Educational attainment (Table P37) • Median household income (Table P53) • Total population (Table P1) • Tables mapped to File Segmentation Table • P37  File 04 • P53  File 06 • P1  File 01

Summary File 3 components • Geographic Identifier file (GeoID) • 76 data files containing different sets of variables • GeoID file is linked to each of the 76 data files by a variable named LogRecNo • The Summary Level must be selected from the GeoID file to extract the desired stratification level. This is used to identify the specific area being tabulated.

Summary Level Sequence Chart (partial listing)

Subject Locator • Index designed to quickly identify tables in the summary file for particular subjects or topics of interest. • Arranged alphabetically by name of subject • Each row contains the type of entry and the relevant table number for the data source

Subject Locator Index (partial listing)

Summary of steps for identifying variables and merging with cohort • Use Subject Locator to identify variables of interest and their corresponding table numbers • Use File Segmentation Table to identify specific data file(s) for each table number • Use the Summary Level Sequence Chart to locate the desired stratification level • Identify SAS input statements to read each file • Merge census variables with existing cohort data

Conclusion • Daunting task due to large volume of Census data and documentation • Well organized into a manageable set of distinct components • Flexibility comes at cost of thousands of variables and data files • Process of extracting variables from Census data becomes much easier once all the pieces are in place

Contact information Robert Matthews University of Alabama at Birmingham Department of Epidemiology 1665 University Blvd. RPHB 517 Birmingham, AL 35294-0022 Email: rsm@uab.edu Web: www.epi.soph.uab.edu/rsm/

Making Sense of Census Data

Making Sense of Census Data

Presentation Transcript

Making Sense of Sensed Data Using Geostatistics

Making More Sense of School Data

Making Sense of the Census

of making sense...

Making Sense of Data from Complex Assessments

Making sense of and Trusting Unstructured Data

Bioinformatics: Making sense of functional genomics data

Making Sense of Qualitative Data

Making sense of data

Making sense of large amounts of molecular data

Making Sense of Qualitative Data

Making Sense of Environmental Data

“EMC CoE DATA SCIENCE - Making Sense of BIG DATA”

InfoMagnets : Making Sense of Corpus Data

Data, Data Everywhere: Making Sense of the Sea of User Data

Making Sense of Life Sciences Data

Making Sense of Unstructured Data

of making sense...

Lecture 1. Making Sense of Data: Data Variation

MAKING SENSE OF THE CENSUS

Making Sense of Environmental Data

Making Sense of Data for Policy Analysis