310 likes | 606 Views
SOC 505 Research Seminar in Empirical Investigation Data Resources at Princeton University. Data and Statistical Services Data Resources at Princeton University DSS Computer Lab, A-16-H-3 Research and Instructional Services Firestone Library data@princeton.edu http://dss.princeton.edu.
E N D
SOC 505 Research Seminar in Empirical Investigation Data Resources at Princeton University
Data and Statistical Services Data Resources at Princeton University DSS Computer Lab, A-16-H-3 Research and Instructional Services Firestone Library data@princeton.edu http://dss.princeton.edu
Contacts • Mary George(mwgeorge@princeton.edu) Senior Reference Librarian (data) • Susan White (sbwhite@princeton.edu) Sociology Librarian (literature) • Bobray Bordelon Data Librarian • Oscar Torres-Reyna & 2nd position vacant Data & Statistics Consultant data@princeton.edu
Coverage: • Time lag from date survey is conducted until data files released often 2+ years • Sub-national data: U.S. only, other large nations (China, India, Canada) where data are collected. U.S. has many state level surveys, some dealing with large cities (N.Y. and L.A.), some case studies (primarily crime) for other cities, areas.
Numeric Data Holdings • Micro-data: Survey or administrative data about an entity. (e.g. person, family, establishment) • Summary statistics: Aggregated counts of survey or administrative data: Number of persons in an area.
A Few Definitions (adapted from ICPSR) longitudinal or panel study • same group of individuals is interviewed at intervals over a period of time. Note that some cross-sectional studies are done regularly. For instance, the General Social Survey and the Current Population Survey: Annual Demographic File are conducted once a year, but different individuals are surveyed each time. Such a study is not a true longitudinal study. An example of a longitudinal study is the National Longitudinal Survey of Labor Market Experience, in which the same individuals have been followed over time.
A Few Definitions (from ICPSR) cross-sectional study • data from particular subjects are obtained only once. Contrast with longitudinal studies, in which a panel of individuals is interviewed repeatedly over a period of time. Note that questions in a cross-sectional study can apply to previous time periods.
A Few Definitions (from ICPSR) hierarchical file • contains information collected on multiple units of analysis in different record types. For example, the physical housing structure may be 1 unit, and individual persons within the structure are another. An example is the Current Population Survey: Annual Demographic File which has household, family, and person units of analysis. Studies that include data for different units of analysis often link those units to each other so that, for instance, one can analyze the persons as they group in a structure. Such studies are sometimes referred to as having a relational structure.
A Few Definitions (from ICPSR) relational structure • includes different units of analysis, particularly when those units are not arranged in a strict hierarchy as they are in a hierarchical file, has a relational structure. Note that the data could be arranged in several different physical structures to handle such a data structure. For instance, each unit of analysis might be stored in a separate rectangular file with identification numbers linking each case to the other units; or, the different units of analysis might be stored in one large file with a hierarchical file structure; or the different units could be stored in a special database structure used by a relational database management system. An example of a study with a relational structure is the Survey of Income and Program Participation, which has 8 or more record types; these record types are related to each other but are not all members of a hierarchy of membership. For instance, there are record types for household, family, person, wage and salary job, and general income amounts.
A Few Definitions (from ICPSR) rectangular file • contains the same number of card images or the same physical record length for each respondent or unit of analysis. Contrast with hierarchical files.
Coverage: • International macro-economic, social, political, & financial indicators. • National surveys, statistics for U.S., many European nations, public opinion surveys from many nations, internationally sponsored surveys dealing with health, fertility, and nutrition.
Major Data Archive Subscriptions • Inter-university Consortium for Political and Social Research • Roper Center for Public Opinion Research • Social Science Electronic Data Archive
More data! • Economic, business and financial data services. http://firestone.princeton.edu/econlib • DSS Subject and Regional Guides. • CPANDA www.cpanda.org
Even more data! • Federal, state, & independent government agencies • Other data archives from around the world • Academic institutions, scholars, think tanks, & private organizations. • Consult the Main Catalog. • Google
Literature about data sets: • A useful way to find useful data sets is to look at the literature of the field. • ICPSR’s Bibliography of Data Related Literature • Sociological Abstracts • Annual Review of Sociology
Data Analysis Options • Refer to published statistics • Use on-line analysis tools • Download data and use a statistical program
On-line Analysis Tools • Current Population Survey (Unicon version) • General Social Survey (in SDA format) • ICPSR SDA files
When you download data… • Documentation • Survey – sampling methods • weights • Variables • Format • Size
Statistical Software Extensions Not Formated Files: .raw .txt .asc .dat
Microdata Structure Cross-sectional Hierarchical Time-series Panel
Transforming Data • Adding Variables • merge • Adding Cases • append • Reshaping Data • reshape • Transposing Data • xpose
+ append merge + = =
reshape transpose
Using Set Up Files Define variables and values • Use set up files • do file (command file) .do • dictionary file .dct • data file .dat, .txt • Read in a few variables • Create your own set up
How to Access Stata • Locally – • Princeton (OIT) cluster computers, • Data and Statistical Services Computer Lab, Firestone Library A-16-H-3 • Remotely – Through Research Computing (nobel). Use Secure Shell.
Stata Remote Access • Register for a server account http://helpdesk.princeton.edu/kb/display.plx?id=9682 • Download Secure Shell (windows) http://helpdesk.princeton.edu/kb/display.plx?id=4104 • X terminal (macintosh)
Using Unix Stata • Accessing files • save them in the “H” drive from Windows • use Secure Shell File Transfer • Unix Stata • interactive type stata at the command prompt • background nohup stata –b do yourdofilename.do & https://dss.wikidot.com/stata-batch-job • x-windows http://dss.wikidot.com/system:page-tags/tag/x-window
Demonstration • log in using secure shell • save a file • using FileExplorer • using SSH • run interactive stata • submit a do file
Data and Statistical Services DSS Computer Lab A-16-H-3 Firestone Library data@princeton.edu http://dss.princeton.edu