210 likes | 223 Views
Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions. Scottish Civil Society Data Partnership. Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education Institutes.
E N D
Webinar 4:Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions • Scottish Civil Society Data Partnership
Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education Institutes Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 4 www.thinkdata.org.uk, 11 Mar 2016
Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions Components: • Academic research and statistical software • Examples in using SPSS for research • Examples in using Stata for research • Examples in using R for research • HE institutional access and the University of Stirling ‘Affiliate Membership for Third Sector Researchers’ scheme S-CSDP, 11 Mar 2016
Webinar 4:Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions • Scottish Civil Society Data Partnership
1) Academic research and statistical software • Academic researchers use software designed specifically for the statistical analysis of survey and survey-like data since at least the mid 1960’s • (Hundreds of options – e.g. Lambert et al. 2015) • Distinction between ‘general purpose’ and ‘specialist’ statistical software • Theme of ‘documentation for replication’: software is better when it can provide a replicable trail of data analysis and management activities S-CSDP, 11 Mar 2016
Understanding filestore and software: Linking things together (i) Somewhere on your computer, you typically have a copy of a data file (& its documentation) (ii) Your next step ordinarily is to access a software package that will be able to open and then do things to the data (iii) If you are good, you will use separately saved ‘command files’ to run processes through the software on the data, generating subsequent outputs S-CSDP, 11 Mar 2016
…software wars in academic survey research… Stata’s origins are in economics but it has spread to other disciplines. It supports a very wide range of data management and analysis functionality. It is popular in North American and North and Central European academic survey research. R is a freeware with a wide range of capabilities. It is mostly used by statisticians and methodologists. • If working with microdata, we ordinarily use specialist statistical software for data management and analysis • People tend to get individually quite attached to their favourite(s) • See also Lambert et al. (2015); and see ‘lab materials’ at www.staff.stir.ac.uk/paul.lambert/essex_summer_school/ SPSS used to be the leading social science package for survey research in disciplines other than economics. It is still widely available and commonly taught and used. MLwiN is an example of specialist software designed for a certain analytical purpose (fitting multilevel models). S-CSDP, 11 Mar 2016
‘Stat-JR’ offers dowloadable integration between software, including freeware, through locally installed copies (http://www.bristol.ac.uk/cmm/software/statjr/ ) S-CSDP, 11 Mar 2016
Controlling software: Using ‘syntax’ S-CSDP, 11 Mar 2016
Documentation as replicable ‘workflows’ Reproducible (for self) Replicable (for all) Paper trail for whole lifecycle Cf. Dale 2006; Freese 2007 • In survey research, this means using clearly annotated syntax files (e.g. Long 2009) Syntax Examples: www.dames.org.uk/workshops Modern computing / data: • There’s no excuse for not documenting / replicating! • New opportunities for ‘workflow modelling’ S-CSDP, 11 Mar 2016
The tension between ‘simpler’ & ‘more complex’ statistical analysis ‘Complex’ analytical methods • E.g. statistical models; sampling weights and survey design factors; sensitivity analysis for data permutations; ‘multivariate’ and ‘multiprocess’ systems • Can be thought of as featuring a substantial element of ‘control’ for other factors relevant to the social mechanisms, e.g. ‘statistical’ models with many parameters expressing influences of ‘background variables’ and complex data structures ‘Simpler’ analytical methods • E.g. univariate distributions, bivariate comparisons, accessible graphical summaries and headline percentages • Can be appealing to communicate and still have important strengths, e.g. statistically representative patterns • Introduce risks in summarising social mechanisms: spurious and unduly simplified trends and associations (e.g. interactions); incorrect point estimates and/or incorrect representation of uncertainty; encourages view that ‘statistics equal lies’ -> Academic software tends to support ‘complex’ methods, whereas many accessible, e.g. online, data analysis tools are using ‘simpler’ methods and moreover cannot readily be adapted to more complex analytical methods S-CSDP, 4 Mar 2016
Webinar 4:Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions • Scottish Civil Society Data Partnership
2) Examples in using SPSS for research • Installation comments • SPSS Interface • Using command syntax • Applied example: Volunteering in the BHPS • Sources of help e.g. Field 2013; UCLA statistical software: http://www.ats.ucla.edu/stat Alternative ‘paste’ to get syntax code ‘Syntax’ editor S-CSDP, 11 Mar 2016
Webinar 4:Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions • Scottish Civil Society Data Partnership
3) Examples in using Stata for research • Installation comments • Stata Interface • Using command syntax • Applied example: volunteering in the ESS • Sources of help e.g. Kohler & Kreuter 2012; UCLA statistical software: http://www.ats.ucla.edu/stat Typical Stata output window (results) Typical format of ‘do’ file (‘command’ or ‘syntax’ file) S-CSDP, 11 Mar 2016
Webinar 4:Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions • Scottish Civil Society Data Partnership
4) Examples in using R for research Standard R • Installation comments • R Interface • Using command syntax • Example: Sample from Lambert (2015) • Sources of help e.g. Field et al. 2012; Quick-R: http://www.statmethods.net/UCLA statistical software: http://www.ats.ucla.edu/stat RStudio S-CSDP, 11 Mar 2016
Webinar 4:Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions • Scottish Civil Society Data Partnership
5) HE institutional access and the University of Stirling ‘Affiliate Membership for Third Sector Researchers’ scheme ‘RCUK’ funding opportunities • ESRC SDAI (explicitly promotes impact & collaboration) (ESRC 2015) • Secondary analysis in general appeals to major funders • Comparative research opportunities Other HE sector collaboration potential • Further funded project options • Unfunded research capacity • PhD studentship sponsorship/collaborative schemes • Training enrolments and taught course projects, e.g. MSc dissertation projects What collaborative opportunities are out there? S-CSDP, 11 Mar 2016
Routes to HE institutional access…? • Feedback at previous events highlights barriers to use of secondary surveys for research without HE Infrastructural support • Filestore • Software • Library resources • Consulting colleagues • Collaboration with HE staff is often a good solution • Friendly researcher/faculty • Funded post, e.g. a sponsored PhD • Please see www.thinkdata.org.uk for updates on a prospective new scheme that should help here, the University of Stirling Affiliate Membership scheme for Third Sector Researchers (AM-TSR) S-CSDP, 11 Mar 2016
References cited • Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158. • Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics, 4th Edition. London: Sage. • Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. London: Sage. • Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods and Research, 36(2), 153-171. • Kohler, H. P., & Kreuter, F. (2012). Data Analysis using Stata, Third edition. College Station, Tx: Stata Press. • Lambert, P. S. (2015). Advances in data management for social survey research. In R. Procter & P. Halfpenny (Eds.), Innovations in Digital Research Methods (pp. 105-122). London: Sage. • Lambert, P. S., Browne, W. J., & Michaelides, D. T. (2015). Contemporary developments in statistical software for social scientists. In R. Procter & P. Halfpenny (Eds.), Innovations in Digital Research Methods (pp. 143-160). London: Sage. • Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. S-CSDP, 11 Mar 2016