540 likes | 686 Views
IPUMS: How we make it, how you can get it, and how you can use it. Trent Alexander. Minnesota Population Center University of Minnesota. Introduction to the IPUMS Project. 1. What is the IPUMS. 1. What is the IPUMS?. 2. Data entry and coding. 3. Harmonization.
E N D
IPUMS: How we make it, how you can get it, and how you can use it Trent Alexander Minnesota Population Center University of Minnesota
Introduction to the IPUMS Project 1. What is the IPUMS 1. What is the IPUMS? 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination
Datasets in IPUMS-USA Census Sample Number of persons in dataset Year Density 1850 1.0 198,000 1860 1.0 354,000 1870 1.0 428,000 1880 1.0 503,000 1900 1.0 846,000 1910 1.4 1,503,000 1920 1.0 1,050,000 1930 0.5 606,000 1940 1.0 1,351,000 1950 1.0 1,922,000 1960 1.0 1,800,000 1970 6.0 12,180,000 1980 9.0 20,403,000 1990 6.0 15,000,000 2000 6.0 16,884,000 2001-2005 5,700,000 0.4-1.0
Datasets in IPUMS-USA Planned 2007-2010 Census Sample Number of persons in dataset Year Density 1850 10.0 1,980,000 1860 1.0 354,000 1870 1.0 428,000 1880 10.0 5,030,000 1900 6.0 4,230,000 1910 1.4 1,503,000 1920 1.0 1,050,000 1930 5.0 6,060,000 1940 1.0 1,351,000 1950 1.0 1,922,000 1960 6.0 10,800,000 1970 6.0 12,180,000 1980 9.0 20,403,000 1990 6.0 15,000,000 2000 6.0 16,884,000 2001-2005 5,700,000 0.4-1.0 2006- ?? 1.0/year
Status of Countries in IPUMS-International Currently in IPUMS-Intl Argentina Belarus Brazil Cambodia Chile China Costa Rica Ecuador France Greece Hungary Israel Kenya Mexico Palestinian Territories Philippines Portugal Romania Rwanda South Africa Spain Uganda United States Venezuela Vietnam
Status of Countries in IPUMS-International Currently in IPUMS-Intl Data Received or Agreement Signed Latin America Europe Asia, Africa, Other Argentina Belarus Brazil Cambodia Chile China Costa Rica Ecuador France Greece Hungary Israel Kenya Mexico Palestinian Territories Philippines Portugal Romania Rwanda South Africa Spain Uganda United States Venezuela Vietnam Armenia Bolivia Austria Canada El Salvador Bulgaria Egypt Dominican Republic Czech Republic Fiji Guatemala Germany Indonesia Honduras Ireland Iraq Nicaragua Netherlands Malaysia Panama Slovenia Mongolia Paraguay United Kingdom Pakistan Peru Tajikistan Uruguay Turkmenistan Current funding for 44 countries by 2009 Next data release late Winter 2007
Datasets in IPUMS-CPS Year Household Person Year Household Person 1962 31,106 71,741 1984 73,632 161,167 1963 24,649 55,882 1985 74,568 161,362 1964 23,438 54,543 1986 74,145 157,661 1965 23,600 54,502 1987 73,843 155,468 1966 48,095 110,055 1988 74,806 155,980 1967 28,924 68,676 1989 70,454 144,687 1968 46,069 150,913 1990 75,269 158,079 1969 47,028 151,848 1991 75,076 158,477 1970 44,982 145,023 1992 74,236 155,796 1971 45,952 146,822 1993 73,878 155,197 1972 44,906 140,432 1994 73,126 150,943 1973 44,467 136,221 1995 72,152 149,642 1974 44,427 133,282 1996 63,339 130,476 1975 43,714 130,124 1997 64,046 131,854 1976 46,368 135,351 1998 64,659 131,617 1977 68,291 160,799 1999 65,377 132,324 1978 67,900 155,706 2000 64,944 133,710 1979 68,375 154,452 2001 64,362 128,821 1980 80,468 181,488 2002 98,848 217,219 1981 81,451 181,358 2003 99,986 216,424 2004 98,979 213,241 1982 73,368 162,703 1983 73,195 162,635 2005 98,664 210,648 2006 98,069 209,542
What Are Microdata? Individual-level data • every record represents a separate person • all of their individual characteristics are recorded • users must manipulate the data themselves Different from aggregate/summary/tabular data •a disability table from www.factfinder.census.gov • an occupation table from a published census volume from the library
Age Birthplace Mother’s birthplace Sex Relationship Race Occupation IPUMS Data Structure Household record (shaded) followed by a person record for each member of the household For each type of record, columns correspond to specific variables
The Advantages of Microdata Combination of all of a person’s characteristics Characteristics of everyone with whom a person lived Freedom to make any table you need Freedom to make models examining multivariate relationships
Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination
John C. Breckinridge of Kentucky How a case gets from the manuscript census into the IPUMS An example from the 1860 census.... Vice President of the U.S., 1856-1860 Secretary of War, C.S.A, 1861-1865 Later charged with treason, fled to Cuba
Checked and coded data, ready for harmonization (ca. 2001) Wealth Occupation Page Year Age Relationship Industry
Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination
Translation Matrix – Marital Status How we integrate variables across time (and countries)
Translation Matrix – Marital Status location of data in the original samples
Translation Matrix – Marital Status location of data in the 1960 U.S. Census Bureau file
Translation Matrix – Marital Status different original codes for “widowed” across the censuses
Translation Matrix – Marital Status final IPUMS coding scheme for marital status
Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination
Additional Improvements to the U.S. PUMS • Additional documentation, including all • enumeration forms and instructions • Consistent occupation/industry classifications • Consistent metropolitan classifications • Missing data allocation • Constructed family variables
IPUMS “Pointer” Variables (Simple household) Spouse’s 2 1 0 0 0 0 Mother’s Father’s 0 0 0 0 0 0 2 1 2 1 2 1
Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination
Extract requests per month, 2002-2007 15,000 users have made 85,000 data extracts
IPUMS Users’ Disciplines • Economics (36%) • Sociology (16%) • Demography (12%) • Other Academic (19%) • Historians: only 3%!!! • Other Non-academic (15%)
IPUMS Users’ Status • Student (46%) • Faculty (23%) • Academic researcher (12%) • Non-academic researcher (16%) • Support staff (3%)
Number of Countries Selected for Research IPUMS-International • 1 country (39%) • 2 countries (24%) • 3 countries (10%) • 4 countries (6%) • 5 countries (3%) • 6-8 countries (17%)
Other IPUMS Data Sources PDQ (www.pdq.com) Fathom (www.keypress.com/fathom)
Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination
Large More cases than any comparable datasets Enable study of relatively small populations • National in scope Results not subject to local peculiarities Provide context for local studies • Long-term Provide historical depth • Microdata Can make your own tabulations Apply multivariate techniques 4 Key Strengths of the Census Microdata Samples
Geographic detail • Samples Confidentiality restrictions Too small to answer some questions (especially ACS/CPS) • Not annual Any historical analysis will have gaps (not if using ACS/CPS!) • Cross-sectional data Not longitudinal (but we’re working on it!) • Need knowledge of a statistical package Limitations of the Microdata Samples
Limitations of the Different IPUMS Data Series • IPUMS-USA Geography 1940-present • IPUMS-International User burden: documentation, information overload • IPUMS-CPS Sample size (60 to 200K)
Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination
Lab: Using weights and making tables • Register for extract system • Weights: • why you have to use them • which ones you should use • 3. Exercises
What is a weight? • It’s a variable, just like age, sex, race, etc. • Every case in every sample has a weight value • The main weighting variable in IPUMS is called... • Person weight (variable name is PERWT) • The person weight variable tells you how many people nationwide are represented by any given case • If you forget to use it, your analysis could be wrong!!!
Sample of pets in my neighborhood Cases in my pet sample dog cat cat cat rabbit rabbit rabbit rabbit 8 cases in sample 50% are rabbits
New estimates that take weights into account Number of pets Cases in my pet sample in my neighborhood that each sample pet represents (PERWT) dog 200 cat 100 cat 100 cat 100 rabbit 25 rabbit 25 rabbit 25 rabbit 25 8 cases in sample 50% are rabbits