300 likes | 461 Views
Lab 1 Background on the IPUMS and SPSS. IPUMS www.ipums.org. The Integrated Public Use Microdata Series database. Lab 1: Introduction to the datasets. What is the IPUMS?. Who uses IPUMS?. What research is IPUMS best for?. Other IPUMS-like datasets. Getting and using the data.
E N D
Lab 1 Background on the IPUMS and SPSS IPUMS www.ipums.org The Integrated Public Use Microdata Series database
Lab 1: Introduction to the datasets • What is the IPUMS? Who uses IPUMS? • What research is IPUMS best for? • Other IPUMS-like datasets • Getting and using the data
WHAT ARE MICRODATA? Individual-level data • every record represents a separate person • all of their individual characteristics are recorded • users must manipulate the data themselves Different from aggregate/summary/tabular data • a disability table from www.factfinder.census.gov • an occupation table from a published census volume from the library
Age Birthplace Sex Mother’s birthplace Relationship Race Occupation IPUMS Data Structure Household record (shaded) followed by a person record for each member of the household For each type of record, specific columns correspond to different variables
The Advantages of Microdata Combination of all of a person’s characteristics Characteristics of everyone with whom a person lived Freedom to make any table you need Freedom to make models to look at multivariate relationships
INTEGRATION What the IPUMS actually does to the original census samples
Column location in original samples Original codes for “Black” IPUMS assigned codes IPUMS Translation Table for RACE
Additional ways in which IPUMS improves the original samples Additional documentation, including all enumeration forms and instructions Consistent occupation/industry classifications Consistent metropolitan classifications Constructed family variables Locator variables for spouse and parents
Lab 1: Introduction to the datasets • What is the IPUMS? Who uses IPUMS? • What research is IPUMS best for? • Other IPUMS-like datasets • Getting and using the data
Profile of IPUMS users • Approximately 9,000 registered users • About 90% are affiliated with universities • Among those: 40% are economists • 25% are sociologists • Most other academics are from the social sciences • Other main users include journalists and policy-makers Who uses the data?
15% download complete datasets 1850-1970 datasets less than 1GB each 1980-2000 datasets about 5GB each We provide raw data and command files 85% make “extracts” using online interface Choose the variables you want We provide customized data and command files ?? Go to data redistributors Querylogic (www.querylogic.com) PDQ (www.pdq.com) How do people get IPUMS data
Lab 1: Introduction to the datasets • What is the IPUMS? Who uses IPUMS? • What research is IPUMS best for? • Other IPUMS-like datasets • Getting and using the data
Large Have more cases than any comparable datasets Enable study of relatively small populations National in scope Results aren’t subject to local peculiarities Moreover, they provide context for local studies Long-term Provide historical depth Microdata Can make your own tabulations Apply multivariate techniques 4 Key Strengths of the Census Microdata Samples
Geographic detail Confidentiality restrictions 1940-2000 Limitations of the Census Microdata Samples 1-in-100 samples (1-in-20 for 1970-2000) Too small to answer some questions Decennial Any historical analysis must use 10-year gaps Cross-sectional data Not longitudinal Need knowledge of a statistical package
What type of question is IPUMS best suited for? • Studies that do not need to identify geographic areas of less than 100,000 after 1940 (e.g., cannot identify Clemson, SC. Can identify a group of several counties of which Clemson is a part). • Subjects that are likely to deal with at least 10,000 people, preferably more. 10,000 individuals will generate about 100 cases in IPUMS. Anything less than this is probably too small a sample for useful analysis. • Any analysis of census-related question that is not answered via the published census volumes or summary files.
An example: Southern migrants in the North 1870-1970 Published census volumes can tell you --How many southern-born persons of each race lived in each state in 1900, 1920, 1930, and 1960 --occupations of all African-Americans in the North But you’re also interested in --The jobs held by actual migrants --How their jobs compared to those who stayed home --How their jobs compared to northern-born blacks --How their settlement changed from 1870 onward
An example: Why this analysis works The numbers are very large --over 500,000 southerners are in the North in every decade from every decade from 1870 on I don’t need to know particular towns --state of residence is available in every census --a sub-state designation known as State Economic Area (SEA) is even available for every census Data not available anywhere else --and so it is worth the trouble
An example: What you can’t do with the IPUMS How did the southerners do in Pittsburgh? --IPUMS has data on 90 employed southern black men in Pittsburgh in 1970, fewer in previous years. Were the migrants segregated in the north? --you don’t know their street, tract, or ward --all you know is their city, and only that if it was a pretty big one (>100K for 1940-50 and 1980-90; >250K for 1960-70; >100K in 2000). Did migrants’ jobs improve over time? --The census samples are cross-sectional databases, not longitudinal ones
Lab 1: Introduction to the datasets • What is the IPUMS? Who uses IPUMS? • What research is IPUMS best for? • Other IPUMS-like datasets • Getting and using the data
New high-density Public Use files 1880: 100% data for selected variables 20% sample for minorities (all variables) 10% sample for entire population (all variables) 1900: 10% sample 1930: 5% sample 1960: 5% sample Ongoing data projects at the MPC
20,000,000 18,000,000 New high-density Public Use files: number of person records in each file 16,000,000 14,000,000 Samples planned and in progress 12,000,000 Existing samples 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000 0 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Census year Ongoing data projects at the MPC
New harmonized intercensal series American Community Survey Available from 2001-2002 on main IPUMS site 2003 data will be available in the Fall of 2004 March Current Population Survey Spans from 1962-2003 Available at http://beta.ipums.org/cps Includes special questions on labor markets Ongoing data projects at the MPC
IPUMS International IPUMS Latin America IPUMS Europe Currently contains 22 samples from 6 countries About 80 variables currently available 15 country project Got underway this year 18 country project Got underway this year Ongoing data projects at the MPC
Lab 1: Introduction to the datasets • What is the IPUMS? Who uses IPUMS? • What research is IPUMS best for? • Other IPUMS-like datasets • Getting and using the data