390 likes | 399 Views
Learn about the types of new data available in Federal Statistical Research Data Centers and how to access them.
E N D
New Data in the Federal Statistical Research Data Centers Melissa Ruby Banzhaf, PhD Administrator, ARDC Center for Economic Studies U.S. Census Bureau October 9, 2015
Overview • Background on Federal Statistical RDCs • Types of Data Available in the RDC (Emphasis on New Data) • How to Obtain Access to this New Data (and other data) in the RDCs
What are Federal Statistical Research Data Centers (RDCs)? • Secure computing labs where qualified researchers conduct approved statistical analysis on non-public data. • These data are collected by various government agencies (Census Bureau, NCHS, AHRQ, SSA, and more to come). • Established through an agreement between federal statistical agencies and a local research community. • Managed by the Census Bureau.
The Atlanta Research Data Center • Located in the Federal Reserve Bank of Atlanta • corner of 10th & Peachtree • Consortium Members • Emory University • University of Georgia • Georgia State University • Clemson University • Federal Reserve Bank of Atlanta • University of Alabama at Birmingham • University of Tennessee – Knoxville • Florida State University • Georgia Institute of Technology
Types of Restricted Data Available • Economic Data • Microdata on firms and establishments • Business Register data • Demographic Data • Survey data on individuals and households • Administrative data on individuals • Linked survey and administrative datasets • Employer-Employee Jobs Data (LEHD) • Data on employees linked with data on employers • Health Data • National Center for Health Statistics • Agency for Healthcare Research & Quality
Advantages of Restricted Data • Vast number of business datasets that are not publicly available at the micro level • Census datasets can be linked together • Census datasets can be linked to external data • More detailed level of geographic identifiers • Very little top or bottom-coding
New Data – Management and Organizational Practices Survey • Supplement to the 2010 Annual Survey of Manufactures • Goal: Collect information on establishment’s use of structured management practices • 36 questions: • 16 Management (monitoring, targets, and incentives) • 13 Organization (who makes decisions, data in decision-making) • 7 background (number of managers/non-managers, union status) • Permits analysis of relationship between management practices and key economic outcomes (e.g., productivity)
Demographic Datasets - Survey • Decennial Surveys (1950-2010) • American Community Survey • Current Population Survey • Survey of Income and Program Participation • American Housing Survey • National Survey of College Graduates • National Crime Victimization Survey
New Data - Decennial • 1950 – 1% PUMS sample • Geography: Census tract but lowest level is enumeration district (roughly 600 people) • 1960 – 25% sample (densest ever) • Geography: Census tract and other sub-county geographies (Census place) but lowest level is enumeration district (roughly 600 people) • Harmonized coding across 1950 and 1960
New Data – Current Population Survey • CPS Basic Monthly Data (2000-2014) • CPS Food Security Supplement (2001-2012) • CPS Voting and Registration Supplement (2006, 2008, 2010, 2012) • CPS Fertility Supplement (1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012)
New Data – Current Population Survey • Characteristics of Internal Files: • Geography: Census Tract • March CPS is only file that has PIKs • Has CPS identification key so may be able to link across CPS surveys. • Some limitations on types of analysis permitted by BLS.
New Data – National Crime Victimization Survey • National survey of households (2006-2012) • Collects information on frequency, characteristics, and consequences of criminal victimization (sexual assault, robbery, burglary, motor vehicle theft etc.) • New: Public Police Contact Survey (2011) – Collects information on perceptions of police behavior and response during encounters.
New Data – National Survey of College Graduates • Biennial survey collects information (such as occupation, work activities, salary, relationship between degree field and occupation) on college-educated individuals with particular emphasis on those in science and engineering fields. • 2010 currently available • Geography at state level • Currently no PIKs
Demographic Datasets -Administrative • Census Numident File (SSA) • Housing Datasets (HUD): • Public and Indian Housing Information Center Dataset • Tenant Rental Assistance Certification Systems dataset • Computerized Homes Underwriting Management System
Demographic - Administrative Continued • Medicare/Medicaid Datasets (CMS): • Medicare Enrollment Database • Medicaid Statistical Information System
Administrative – Census Numident • Data derived from applications for Social Security Numbers • Contains data on: • Birthdate • Town or county of birth • Gender • Race • Citizenship • Date of death • PIKs
Administrative - Housing • Public and Indian Housing Information Dataset • Contains information on all members of HH with a participant in a covered program: • Housing Choice Voucher • Public Housing • Indian Housing • Includes age, race, sex, rent, household income, PIK • Geography: block level
Administrative - Housing • Tenant Rental Assistance Certification Systems (TRACS) dataset • Contains information on all members of HH with a participant in a covered program. • These programs provide rental assistance for participants living in privately-owned, subsidized housing. • Includes age, race, sex, rent, household income, PIK • Geography: block level
Administrative - Housing • Computerized Homes Underwriting Management System (CHUMS) • Contains records on approved mortgage applications insured by Federal Housing Administration (FHA) • Contains information on borrowers and co-borrowers including income, housing value, mortgage, demographic characteristics, PIKs • Geography: block level
Administrative - CMS • Medicare Enrollment Database (1999-2014) • Information on all Medicare beneficiaries • Limited to information on people not claims: eligibility dates and statuses, residence change dates, basic demographic information, PIKs • Geography: block level
Administrative - CMS • Medicaid Statistical Information System (2000-2013) • Information on all Medicaid and CHIP enrollees in each month • Limited to information on people not claims: eligibility dates and statuses, basic demographic information, PIKs • Geography: zip code level
Demographic Datasets: Linked Survey-Administrative • Current Population Survey - SSA Earnings Files • Survey of Income and Program Participation – SSA Earnings Files • National Longitudinal Mortality Study
Linked: SSA Files with CPS and SIPP • CPS and SIPP Survey Data matched to SSA earnings files by PIK • SSA records include: • Detailed Earnings Record – earnings from FICA, non-FICA, and self-employment income (1978+) from Master File • Summary Earnings Record – all earnings for each year from 1951 to present • Master Beneficiary Record – contains information (entitlement and payment data) on Social Security Recipients (including Disability). • 831 Disability File – determines medical eligibility for Disability Insurance, and SSI benefits.
Linked: National Longitudinal Mortality Study • Purpose of database: to study the effects of demographic and socio-economic characteristics on mortality • Survey data: March CPS, 1980 Decennial Census (sample) • Administrative data: Death Certificate information from National Death Index (through 2011) • Geography: county level
LEHD • “Tracks” a person based on their place of employment; essentially links employees with employers • Based on unemployment insurance administrative records • Available on a state-by-state basis • Quarterly data starting in 1990 – currently through 2011 • Can link employer to employer data in other Census datasets • Can link employee to data on individuals in other Census datasets • New Variables: Firm age and size, Firm ID that matches Business Register
New Data – Innovation Measurement Initiative • Goal: Improve measurement of innovation resulting from research grants, a small but important sector of the economy. • How: Integrate university data on federally funded research grants with Census Bureau data on people and businesses. • Specifically link: • Employee, vendor, sub-award transactions to the Census Business Register and LEHD (employee-employer database). • Innovation outcomes: Job placements, start-up activity and business dynamics, vendor characteristics
New Data – Innovation Measurement Initiative • Partnership between Census and Institute on Research in Innovation and Science (IRIS) at the University of Michigan • Member institutions of IRIS provide data to Census and in turn receive: • Individual and collective reports • Underlying tables and graphics for institution’s use • Access to aggregate data for researchers • Input on new product design
New Data – IMI Opportunity • Census is asking for nominations of teams of 2-5 researchers (at least one member with SSS) to assist in enhancing and documenting data for the IMI project. • What is in it for you? • Opportunity to do research on new data. • $25K in funding support for 1 graduate student. • Initial deadline for nominations: October 16
Health Data in the ARDC • These data are collected by: • National Center for Health Statistics (NCHS) • Agency for Healthcare Research and Quality (AHRQ)
What types of NCHS data? Linked Data Sets • Linked mortality data: NHIS, NHANES LSOA II, NNHS • Linked Medicare Enrollment and Claims data: NHIS, NHANES, LSOA II • Linked Social Security Administration Data: NHIS, NHANES, LSOA II, NNHS • Linked EPA data
What types of AHRQ Data? • Medical Expenditure Panel Survey (MEPS) files include: • Household Component • Provider Component • Insurance/Employer Component • Nursing Home Component (1996 only) • Area Resource File • Two-year two panel file • MEPS-NHIS linked data • Only Household Component and portions of Provider Component are publicly available
How to Access the RDC • Develop proposal • Different guidelines for Census data vs. NCHS/AHRQ guidelines • Submit proposal for agency review • Census (and agency sponsors) • NCHS/AHRQ • Obtain Special Sworn Status (SSS) • Pay one-time fee for NCHS/AHRQ data
Timeframe – “Patience is a Virtue” • Census Data • Plan on 6 to 9 months before working in lab • Census approval/ Other Agency Approval • NCHS/AHRQ Data • Timeframe dependent on agency approval process • Census approval NOT required • Special Sworn Status • 3 to 4 months for your security clearance
Working in the ARDC lab • All analysis conducted in the ARDC lab • Data located on server in Maryland • Access data via thin client terminals • No internet access or personal computers allowed in lab • Statistical software available: SAS, Stata, R, Matlab, GIS, Sudaan, etc. • Agency reviews output before releasing • Penalty for disclosure is $250,000 and/or 5 yrs in prison (inadvertent or otherwise)
Upcoming RDC-Related Events • Cornell University Course – INFO 7470 – Understanding Social and Economic Data • Can be connected via distance learning (and get course credit) • Intended for Ph.D. students and faculty who use large-scale restricted-access data from government suppliers • Emphasis on data accessible through the RDC network • Interested? Contact us for more information.
Contact Information • People: • Melissa Ruby Banzhaf, ARDC Administrator melissa.r.banzhaf@census.gov, 404-498-7538 • Julie L. Hotchkiss, ARDC Executive Director Julie.l.hotchkiss@atl.frb.org, 404-498-8198 • Resources: • ARDC website: atlantardc.org • Quarterly ARDC Newsletter (email us to get on list)