210 likes | 217 Views
SDA: a tool for teaching and research with microdata. Laine Ruus laine.ruus@utoronto.ca University of Toronto. Data Library Service 2007/05/17. What this poster covers:. Introduction Demo of main SDA capabilities Advantages and disadvantages for teaching and research
E N D
SDA: a tool for teaching and research with microdata Laine Ruus laine.ruus@utoronto.ca University of Toronto. Data Library Service 2007/05/17
What this poster covers: • Introduction • Demo of main SDA capabilities • Advantages and disadvantages for teaching and research • Common questions about SDA
SDA@UT is brought to you by: • University of California, Berkeley. Computer-assisted Survey Methods Program (CSM) – writes and supports the server-side software • University of Toronto. Centre for Computing in the Humanities and Social Sciences (CHASS) – provides the hardware, buys the software, and provides system support wetware • University of Toronto. Libraries – provides the budget to purchase the data, and care, feeding and user support wetware
Our experience with SDA • CHASS installed SDA in the fall of 2004 • At last count, have 600+ data files in SDA • Some have only the metadata that was generated from the original syntax files (SAS/SPSS/Stata), but a number also have full question text. • Most are microdata, but a few are aggregate statistics (census files) • A number of voracious data users now expect to find the latest microdata released by Stat Can in SDA
Review of main SDA utilities • Frequencies, weighted & unweighted • Crosstabulations • Comparison of means (ANOVA) • Correlations • Regressions • Logit/probit regressions
Tips & tricks • Have we not gotten around to coding the missing values? • Want to include missing values in your cross-tabulation, or other analysis? • Collapsing uniform categories of continuous variables on the fly • Recoding variables on the fly
Tips & tricks (2) • Computing percentages in aggregate data? • Dummy coding variables in regressions • Defining an interaction on the fly
Advantages for teaching: • Stable environment, 24x7 access • Very easy to explain to novice users • Reduce/eliminates need for computer labs or statistical software • Teach statistics rather than software • Students get hands on data quickly • Switch easily between weighted and unweighted distributions
Advantages for teaching (2): • Measures of association and tests of significance comparable to SAS • Design effects, where cluster/sample variables available • Interactive demonstration of statistical concepts • Share recoded variables • Can quickly mount additional data to fulfill your teaching needs
Advantages for research: • Stable environment, 24x7 access • Access to latest available version of the data • Basic exploratory data analysis: eg are there enough cases for my subset? • Download data and import to SAS/SPSS/Stata on own workstation • Share recoded variables • Integrated variable descriptions (selected data files)
Advantages for data management: • Creates metadata from SAS/SPSS/Stata syntax or DDI format xml files • Very easy and fast to import files with good syntax files • Control over what users can and cannot do • Outputs include SAS/SPSS/Stata syntax or DDI format xml files • Overhead: size of uncompressed data + about 50%
Disadvantages of SDA: • Can’t search for variables/values within/between data files (yet) – at least, not at UT/CHASS • Can’t download created/recoded variables – coming in spring 2009 • No random sampling function. See <http://www.chass.utoronto.ca/datalib/caq/sda.htm> • Graphics minimal, eg no stem-and-leaf, box-plots etc • Can only output to Word/Excel from IE, not from Netscape/Mozilla/Firefox • Doesn’t output SAS/SPSS/Stata system/export files • Little support for Study/File level metadata (DDI) • No support for nCubes (DDI 2)
Common questions from researchers & students: • When to weight versus not to weight • Does it only do cross-tabs? • But I want the raw data, not a cross-tabulation! • Why can’t I get a cross-tab of this [eg continuous income] variable? • Differences between syntax, data, and system files.
An application we wouldn’t have tackled without SDA: • Q: I need the average expenditure on eye care in Canada by age group of household head for as long a time-period as possible. • A: Once we explained SDA, the student had generated this statistics from each of the FAMEX/SHS files, 1969-2004 in under 30 mins. (He knew only Stata.)
Functions we know to be coming in SDA • Within and between file variable searching • Will allow users to load own data files (Archiver in SDA 3.1) – we have not played with this yet
Questions: • Question 1: Where will I find the SDA server at University of Toronto? • Answer 1: The URL is: http://www.chass.utoronto.ca/datalib/ Select ‘Microdata analysis and extraction’
Question 2 How are files chosen to be mounted on the SDA server at UT? Answer 2 All significant Canadian microdata files, eg by Statistics Canada as released by DLI Other files based on faculty/student requests Questions (cont’d):
Question 3: My research is being done collaboratively with a colleague at another Canadian university. Can my colleague get access to SDA? Answer 3: SDA is available as a subscription service to other Canadian DLI-member universities and colleges. Current subscribers include: U of Victoria, Ryerson U, and Memorial U Questions (cont’d):