500 likes | 708 Views
Introduction to CSSCR Archive and Campus Data . Tina Tian Data Archivist txtian@u.washington.edu. Topics. Major Sources of CSSCR Data Archive Finding Data Sets at CSSCR Other Data Resources at CSSCR Introduction to Decennial Censuses and American Community Survey. CSSCR Archive.
E N D
Introduction to CSSCR Archive and Campus Data Tina Tian Data Archivist txtian@u.washington.edu
Topics • Major Sources of CSSCR Data Archive • Finding Data Sets at CSSCR • Other Data Resources at CSSCR • Introduction to Decennial Censuses and American Community Survey
CSSCR Archive • The Center for Social Science Computation and Research (CSSCR) maintains a large electronic data archive related to social science research. • Data set are available through web viewer, network server or CDROM.
Major Sources of CSSCR Data Archive • Inter-University Consortium for Political and Social Research (ICPSR) • US Census Bureau • Bureau of Labor Statistics • Washington State Data Center
Major Sources of CSSCR Data Archive • Inter-University Consortium for Political and Social Research (ICPSR) http://www.icpsr.umich.edu • Membership-based organization founded in 1962. Provides access to the world’s largest archive of computerized social science data. • Offers training facilities for the study of quantitative social analysis techniques (e.g. the ICPSR Summer Program in Quantitative methods of Social Research).
Major Sources of CSSCR Data Archive • US Census Bureau http://www.census.gov • 1990, 2000 Decennial Census of Population & Housing • Summary Tape File/Summary File (STF/SF) • Public Use Microdata Sample (PUMS) • American Community Survey (ACS)
Major Sources of CSSCR Data Archive • Bureau of Labor Statisticswww.bls.gov/nls • National Longitudinal Survey of Youth 79,97 (NLSY79, NLSY97) Public-use File (CDs are available at CSSCR, or free downloadable on BLS website) • National Longitudinal Survey of Youth 79,97 Geocode data (confidential data) • Provides geographic variables for data file • To protect the confidentiality of respondents, the agreement letter has to be signed with BLS.
Major Sources of CSSCR Data Archive • National Center for Education Statistics http://nces.ed.gov/surveys/ http://nces.ed.gov/surveys/SurveyGroups.asp?Group=1 • Education Longitudinal Study of 2002/06 • The second follow-up data file of ELS2002 • The restricted use data file • Room 106 Savery Hall is the security room for the restricted data file using
Major Sources of CSSCR Data Archive • Washington State Data Center http://www.ofm.wa.gov • WA State Vital Statistics • WA State Population Projections • WA state Population Surveys • Pregnancy & Abortion Data
Other Sources of CSSCR Data Archive • Data Access via DataFerrett http://dataferrett.census.gov • Current Population Survey http://cps.ipums.org/cps/ • Survey of Income Program Participation • iPOLL databank at The Roper Center for Public Opinion Research is available through UW library http://roperweb.ropercenter.uconn.edu/cgi- bin/hsrun.exe/Roperweb/iPOLL/iPOLL.htx;start=HS_iPOLL_LoginSetup • National Center for Health Statistics http://www.cdc.gov/nchs/express.htm
Finding Data Sets at CSSCR • Web Site • CDROM • Codebook All these materials are available at 110 Savery Hall or CSSCR web site
Finding Data Sets throughCSSCR web viewers • A complete list of data sets at CSSCR is available on the CSSCR Web page. • Most online data sets at CSSCR can be accessed through a web browser. • The CSSCR archive website address is http://julius.csscr.washington.edu
Finding Data Sets throughCSSCR web viewers • The data sets on the CSSCR homepage are divided into several categories: • ICPSR data • CDROM data • Census 2000 • ACS • Census 2010 Clicking on one of these five icons will bring you to “ICPSR Resource” or “CDROW list” or “Census 2000, ACS Washington data”
Finding Data Sets throughCSSCR web viewers • In “ICPSR resource”, click on • “Archive Brower” lets you search the data to get files you want. Under each title, information such as data source, codename, abstract and storage medium is displayed.
Types of File • Codebooks & Documentation • Dataset codebook: <file name>.cod • Data dictionary:<file name>.dic or <file name>.doc • file description:<file name>.des • Frequency listing:<file name>.fre • Dataset errata:<file name>.err
Types of File • Data Files • ASCII file:<filename>.dat • SPSS system file:<filename>.sav or <filename>.svf • SPSS portable file:<filename>.por or <filename>.exp • SPSS syntax file:<filename>.spss • SAS data file:<filename>.sas7bdat • SAS catalog file:<filename>.sas7bcat • SAS transport file:<filename>.xpt • SAS syntax file:<filename>.sas • STATA data file:<filename>.dta • STATA syntax file:<filename>.do • STATA dictionary file:<filename>.dct
Economic Data at CSSCR • Economagic: Economic Time Series Page http://www.economagic.com/ Provides internet browsing for the U.S. business, economic and trade information • DRI_WEFA Basic Economics Database • Datastream Database
DRI_WEFA Basic Economics Database • A national macroeconomics database that contains about 7000 monthly, quarterly and annual time series dated back to 1946 when available and end with the latest available observations. • Includes financial data, construction & housing data, industrial statistics, population counts & estimates, foreign trade & interest rates • Accessible through E-Views in CSSCR lab. A reference book is available at room 110 Savery Hall .
DataStream Database • Provides access to various global economic and financial databases (e.g. National Government & OECD Series, International monetary funds, equities, bond indices, interest and exchange rates, company account definitions, etc). • At CSSCR, DataStream is only available through the Archivist at Room 113 Savery Hall.
Seattle Data Viewer • A neighborhood information system. • Provides access to a comprehensive set of information about the city infrastructure and environment. • Allows to organize and print data and maps of the city. • Accessible at CSSCR lab through “P:\Data\Seattle_Data_viewer”.
Seattle Data Viewer • Neighborhood statistics are grouped into the units: base map Crimes and public safety Housing, health, education and civic locations Land use, value and zoning Landscape and environmental features Municipal and district Boundaries Park, recreation and open space Population and demographics Streets and transportation Utilities
Available Census Data at CSSCR 1980 census data STF1, STF3 (raw data) 1990 census data STF1, STF2, STF3, STF4, 1%PUMS, 5%PUMS 2000 census data SF1, SF2, SF3, SF4, 1%PUMS, 5%PUMS 2005-2008 1-Year ACS ACS SF, 5%PUMS 2005-2008 3-Year ACS ACS SF, 5%PUMS http://julius.csscr.washington.edu/Decennial%20Census.htm http://julius.csscr.washington.edu/american_community_survey.htm
Census CDs (GeoLytics) Census CD 1960 Long Form Census CD 1970 Long Form Census CD 1980 Long Form in 2000 Areas Census CD 1980 Long Form Census CD 1990 blocks & Long Form Census CD 1990-2000 Census CD 2000 blocks & Long Form Census 2000 Redistricting NCDB – Neighborhood Change Database StreetDVD 2007 http://julius.csscr.washington.edu/Census%20CD.htm Available in the Room 119 Savery Hall
Introduction to Decennial Censuses • Decennial Census of Population & Housing • Summary Tape File/Summary File (STF/SF) • Public Use Microdata Sample (PUMS)
Introduction to Decennial Censuses • What is Summary Tape File/Summary File (STF/SF) • The basic unit of analysis is a specific geographic area. • About counts of persons or housing units in particular categories. • Also called tabulated summary statistics.
Introduction to Decennial Censuses • The Types of STF/SF • STF/SF 1 and 2 present tabulated data from the Census short-form (100%) questionnaire. • STF/SF 3 and 4 present cross-tabulations of information from the long-form (sample) questionnaire. • Tables in STF/SF 2 and 4 are iterated for many detailed racial groups, as well as American Indian and Alaska Native tribes. In SF4, many data are also tabulated by detailed ancestry groups.
Introduction to Decennial Censuses • 2000 Census short-form questionnaire: • full population • six questions • Household relationship • Sex • Age • Hispanic or Latino origin • Race • Tenure (whether the home is owned or rented)
Introduction to Decennial Censuses • 2000 Census long-form questionnaire: • a sample includes 15.8%-17% of full population • separates as two parts • Population • social and economic characteristics (14 areas) • Housing • physical and financial characteristics (11 areas)
Introduction to Decennial Censuses • In 1980, and 1990 census data (STF1A, STF2B,STF3C,STF4D…): • Letter A,B,C,D indicate different level of the geographic area • A - block groups; B - block, zip codes; • C – place, county; D - Congressional district; • In 2000 census data, no letters indicate the level of the geographic area • Table indicators: • P - person; H - housing unit; • PCT/ PT – person down to Census tract level; HCT/ HT - occupied housing unit down to Census tract level; http://julius.csscr.washington.edu/Decennial%20Census.htm
Introduction to Decennial Censuses • What is Public Use Microdata Sample (PUMS) • The basic unit of analysis is a housing unit or the person who live in it with identifiers (such as addresses, names, etc) removed to protect individual confidentiality. • It’s a stratified sample of the population which was created by sub sampling the full census sample that received census long form questionnaires
Introduction to Decennial Censuses • The Types of PUMS • 5-percent sample file (PUMS-A file) • 1-percent sample file (PUMS-B file)
Introduction to Decennial Censuses • 5-percent sample file (PUMS-A file) • provides the user records for over 14 million people and over 5 million housing units • Public Use Microdata Areas (PUMA) is the lowest level of geographic identifier, with a minimum population threshold of 100,000 • Sample has only been produced since 1980
Introduction to Decennial Censuses • 1-percent sample file (PUMS-B file) • Provides a fuller range of detailed characteristics • Provides the user records for over 2.8 million people and over 1 million housing units • Each super-PUMAs meet a minimum population of 400,000 and are composed of a PUMA or PUMAs delineated on the 5-percent PUMS files
Introduction to Decennial Censuses • Integrated Public Use Microdata Series (IPUMS) http://www.ipums.umn.edu/ • Consists of thirty-eight high-precision samples of the American population drawn from fifteen federal censuses (1850 – 2000) and from the American Community Surveys of 2000-2008 • Is particularly useful for historical research because data can be comparable across time
What is American Community Survey (ACS) • is a large, continuous demographic survey • produces annual and multi-year estimates of the characteristics of the population and housing • will replace the 2010 census long form by collecting detailed information throughout the decade • Short form still remains in 2010 decennial census
ACS Program Schedule • Testing and development: 1994-2004 • Full implementation began in 2005 • Group Quarters data collection began in 2006
What is a Group Quarters (GQ)? • Definition: A living quarter in which unrelated people live or stay other than the usual house, apartment, or mobile home. Examples: • Institutional: Nursing homes, hospitals, prison wards • Non-institutional: College dorms, military barracks, shelters
Full Implementation • Annual national sample of approximately 3 million addresses in every county and American Indian and Alaska Native area in the United States • Provide profiles every year for communities of 65,000 population or more • Provide 3-year accumulations for communities of more than 20,000 population • Provide 5-year accumulations for all communities, the lowest geographic level could be block group
ACS Data Release Schedule Before 2004 ACS the population threshold is 250,000+
ACS file types • ACS Summary File (ACS SF) • ACS Public Use Microdata Sample • One-year PUMS (1-in-100, 1%, national random sample of the population, 1.3 million housing units & 3 million people) • Three-year PUMS (3-in-100, 3%, national random sample of the population, 3.9 million housing units & 9.1 million people) • Five-year PUMS (5-in-100, 5%, national random sample of the population) • Public Use Microdata Areas (PUMA) is the lowest level of geographic identifier, with a minimum population threshold of 100,000
Comparing ACS with the Decennial Census long form questionnaires • Samples rate/size & design • Data collection • Residence rules & reference periods
Samples rate/size & designComparison • Census sample estimates based on about 18 million housing units; ACS 5 year estimates based on about 11 million housing units, 1 year estimates based on about 3 million housing units • ACS samples every year and spreads sample over 12 months; census samples once a decade and uses the entire sample at the same time ACS estimates have higher sample error than census long form, however shown as 90% confidence limits or margins of error in every table. Similar sampling error measures for census long form sample estimates have not been provided
Data Collection Comparison • ACS nonresponse follow-up uses computer-assisted telephone and computer-assisted personal interviews; past censuses have used only paper questionnaires; • ACS data collected only from household members; census data often collected from neighbors ACS has higher level of overall response and individual item response, so less chance of nonresponse bias, means lower potential nonsampling error
Residence Rules Comparison • Decennial census based on concept of “usual residence” • The place where the person lives and sleeps most of the time • If a person had no usual residence, the person was to be counted where he or she was staying on Census Day • ACS uses a “two-month” rule - Resident of an address if a person • Lives there year round • Lives there more than 2 months but not year round • Is living there now with no other place to live? - Not a resident of an address if a person • Lives there 2 months or less with another residence • Is away now for more than 2 month Compare with Caution
Reference Periods Comparison • ACS uses the interview date as the single reference point, or as the end of a reference period, for all data collection • Decennial census always use Census day-April 1st as reference point • Examples: • Income • ACS asks for income for the previous 12 months • Decennial census income data refer to the previous calendar year April 1 • School enrollment • ACS asks if a person attended school during the “last three months • Census 2000 asks if a person attended school “any time since April 1” Compare with Caution
Comparison • Comparing ACS Data to Census 2000 & Other Sources http://www.census.gov/acs/www/guidance_for_data_users/comparing_data/ • When to use 1-year, 3-year, or 5-year estimates http://www.census.gov/acs/www/guidance_for_data_users/estimates/ • Comparing 2008 ACS data http://www.census.gov/acs/www/guidance_for_data_users/comparing_2008/
Available ACS data • 2005 single-year ACS provides household population only for areas with populations of 65,000 or more • 2006, 2007 & 2008 single-year ACS provides household population and group quarters population for areas with populations of 65,000 or more • 2005-2007 & 2006-2008 three-year ACS provides household population and/or group quarters population for areas with populations of 20,000 or more
ACS Data Release Schedule in 2010 • 2009 single-year ACS will be released by the end of September. These estimates will be available for the areas with populations of 65,000 or more. • 2005-2009 5-year ACS is planed to release in December. These estimates will be available for all areas regardless of population size, down to the census tract. Early in 2011, 2005-2009 ACS Summary Files down to the block group will be released as will the 2005-2009 PUMS files. • 2007-2009 3-year ACS is planed to release in January 2011. These estimates will be available for all geographic areas with populations of 20,000 or more.