1 / 35

SDA: a tool for teaching and research with microdata

SDA: a tool for teaching and research with microdata. Laine Ruus < laine.ruus@utoronto.ca > University of Toronto. Data Library Service 2008-12-03, revised 2009-04-14 http://www.chass.utoronto.ca/misc/mun09/sda_intro.ppt. What this session covers:. Introduction

Download Presentation

SDA: a tool for teaching and research with microdata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SDA: a tool for teaching and research with microdata Laine Ruus <laine.ruus@utoronto.ca> University of Toronto. Data Library Service 2008-12-03, revised 2009-04-14 http://www.chass.utoronto.ca/misc/mun09/sda_intro.ppt

  2. What this session covers: • Introduction • Demo of main SDA capabilities • Some tips and tricks • Advantages and disadvantages for teaching and research • Common questions about SDA

  3. SDA@UT is brought to you by: • University of California, Berkeley. Computer-assisted Survey Methods Program (CSM) – writes and supports the server-side software • University of Toronto. Centre for Computing in the Humanities and Social Sciences (CHASS) – provides the hardware, buys the software, and provides system support wetware • University of Toronto. Libraries – provides the budget to purchase the data, and care, feeding and user support wetware • And Memorial University Libraries which subscribes to the service.

  4. Our experience with SDA • CHASS installed SDA in the fall of 2004 • At last count, have 900+ data files in SDA • Some have only the metadata that was generated from the original syntax files (SAS/SPSS/Stata), but a number also have full question text. • Most are microdata, but a few are aggregate statistics (census files) • A number of voracious data users now expect to find the latest microdata released by Stat Can in SDA

  5. Review of main SDA utilities • Frequencies, weighted & unweighted • Crosstabulations • Comparison of means (ANOVA) • Correlations • Regressions • Logit/probit regressions

  6. Tips & tricks • Have we not gotten around to coding the missing values? • Want to include missing values in your cross-tabulation, or other analysis? • Collapsing uniform categories of continuous variables on the fly • Recoding variables on the fly

  7. Problem: in this variable, we have not yet coded value ‘5’ as missing data. Therefore it would be included in analyses.

  8. Solution: specify, after the variable name, only those values you want to include

  9. Problem: to include values coded as missing in descriptive statistics or analyses This is a missing value. It will not be included in descriptive statistics or analyses.

  10. Solution 1: specify, after the variable name, the lowest value thru **.

  11. Solution 2: use ‘include missing data values’ under Table options

  12. Solution 3: list the values explicitly after the variable name

  13. Problem: to generate frequencies or a cross-tabulation of a continuous variable

  14. Solution 1: collapse to uniform categories, defining a starting point c:30000,-30000 means: - collapse to uniform categories - each category should be 30000 in size - begin with value -30000

  15. Solution 2: recode to desired categories. Note use of * to denote both lowest and highest values.

  16. Tips & tricks (cont’d) • Computing percentages in aggregate data • Dummy coding variables in regressions • Defining an interaction on the fly

  17. Problem: given a file of aggregate statistics, list percentages rather than counts. [NB use the Listcase program] These are all counts

  18. Solution: define percentages in the Listcase program. Defines a percentage with v4 in the numerator and v2 in the denominator.

  19. Problem: to use a categorical variable in a regression analysis, it needs to be ‘dummy’-coded (ie ‘1’ and ‘0’).

  20. Solution: dummy-code categorical variables ‘on-the-fly’. Interactions can also be coded on-the-fly, including interactions with dummy-coded variables. Dummy coded: values 10-14 will be coded to ‘1’, all others will be ‘0’. Interaction involving a dummy coded variable and a continuous variable.

  21. Advantages for teaching: • Stable environment, 24x7 access • Very easy to explain to novice users • Reduce/eliminates need for computer labs with statistical software • Allows you to each statistics rather than software • Students get hands on data quickly • Switch easily between weighted and unweighted distributions

  22. Advantages for teaching (cont’d): • Measures of association and tests of significance comparable to SAS • Design effects, in files in which cluster and/or statum variables are available • Interactive demonstration of statistical concepts • Share recoded variables • Can quickly mount additional data to fulfill your teaching needs

  23. Advantages for research: • Stable environment, 24x7 access • Access to latest available version of the data • Basic exploratory data analysis: eg are there enough cases for my subset? • Design effects, where cluster/sample variables available • Download data and import to SAS/SPSS/Stata on own workstation • Share recoded variables • Integrated variable descriptions (selected data files)

  24. Advantages for data management: • Creates metadata from SAS/SPSS/Stata syntax or DDI format xml files • Very easy and fast to import files with good syntax files • Control over what users can and cannot do • Outputs include SAS/SPSS/Stata syntax or DDI format xml files • Overhead: size of uncompressed data + about 50%

  25. Disadvantages of SDA: • Search for variables/values among data files not yet implemented at UT/CHASS • Can’t download created/recoded variables – coming in spring 2009 • Graphics minimal, eg no stem-and-leaf, box-plots etc • Doesn’t output SAS/SPSS/Stata system/export files, only raw data files plus syntax files • Little support for Study/File level metadata (DDI) • No support for nCubes (DDI 2)

  26. How SDA compares to the competition • See table at: http://www.chass.utoronto.ca/datalib/misc/accoleds/2008/sda_compare.htm

  27. Common questions from researchers & students: • When to weight versus not to weight • Does it only do cross-tabs? • But I want the raw data, not a cross-tabulation! • Differences between syntax, data, and system files.

  28. An application we wouldn’t have tackled without SDA: • Q: I need the average expenditure on eye care in Canada by age group of household head for as long a time-period as possible. • A: Once we explained SDA, the student had generated this statistics from each of the FAMEX/SHS files, 1969-2004 in under 30 mins. (He knew only Stata.)

  29. Functions we know to be coming in SDA • Among-file variable searching – already available but not yet implemented on CHASS • Downloading recoded variables • Will allow users to load own data files (Archiver in SDA 3.1) -- already available but not yet implemented on CHASS

  30. Exercises: • First time SDA user? Try these exercises using the Census 2001 microdata on individuals • Experienced SDA user? Try these exercises using a variety of DLI data files

  31. Questions: • Question 1: Where will I find the SDA server at University of Toronto? • Answer 1: The URL is: http://www.chass.utoronto.ca/datalib/ Select ‘Microdata analysis and extraction’

  32. Question 2 How are files chosen to be mounted on the SDA server at UT? Answer 2 All significant Canadian microdata files, eg by Statistics Canada as released by DLI Other files based on your requests Questions (cont’d):

  33. Question 3: My research is being done collaboratively with a colleague at another Canadian university. Can my colleague get access to SDA? Answer 3: SDA is available as a subscription service to other Canadian DLI-member universities and colleges. Current subscribers include: U of Victoria, Ryerson U, and Memorial U Questions (cont’d):

More Related