1 / 17

Analyzing Surveys

Analyzing Surveys. Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT. Laboratory for Interdisciplinary Statistical Analysis. Outline. Data Cleaning and Preprocessing Outlier Detection Missing Value Imputation Visualizing and Understanding Data

gabe
Download Presentation

Analyzing Surveys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical Analysis

  2. Outline • Data Cleaning and Preprocessing • Outlier Detection • Missing Value Imputation • Visualizing and Understanding Data • Boxplots, Histograms, and Scatterplots • Correlation Matrices • Analyzing Data • Contingency Tables • Analysis of Variance (ANOVA) • Regression

  3. Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...) Our goal is to improve the quality of research and the use of statistics at Virginia Tech. www.lisa.stat.vt.edu www.lisa.stat.vt.edu

  4. How can LISA help? • Formulate research question. • Screen data for integrity and unusual observations. • Implement graphical techniques to showcase the data – what is the story? • Develop and implement an analysis plan to address research question. • Help interpret results. • Communicate! Help with writing the report or giving the talk. • Identify future research directions.

  5. Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics Designing Experiments • Analyzing Data • Interpreting ResultsGrant Proposals • Using Software (R, SAS, JMP, Minitab...) Collaboration From our website request a meeting for personalized statistical advice Great advice right now:Meet with LISA before collecting your data Walk-In Consulting Monday—Friday 1-3 pm in 401 Hutcheson Also, Tuesdays 1-3 pm in ICTAS Café X & Thursdays 1-3 pm in GLC Video Conf. Room for questions requiring <30 mins Short Courses Designed to help graduate students apply statistics in their research All services are FREE for VT researchers. www.lisa.stat.vt.edu

  6. Some Useful Resources • R Statistical Computing Software • Can be downloaded for free from: http://www.r-project.org/ • R Studio, a free Integrated Development Environment: http://rstudio.org/ • For a more interactive and user-friendly experience, try JMP • Downloadable from the Virginia Tech software library: http://www2.ita.vt.edu/software/department/products/sas/jmp/index.html • Amelia II: A Program for Missing Data • Visit: http://gking.harvard.edu/amelia/

  7. Types of Survey Data

  8. Outlier Detection and Handling • Outliers are data points that deviate far from the main body of data so as to arouse suspicion about their origins • Visualize your data • Boxplots, histograms, and scatterplots • Only remove outliers that are verifiable errors • Extremeness in observations is not in itself cause for data removal • R Package ‘outliers’ Outlier

  9. Missing Value Imputation • Imputation is the process of filling in the missing values of a dataset • Before considering imputation, try going after respondents for their true answers • Can be very tricky (Come to LISA for help) • If only one or two missing values are present in a vast dataset, use the mean of available values as a “best guess” Honaker, James et al., AMELIA II: A Program for Missing Data

  10. Visualizing Your Data Boxplots SAS/GRAPH(R) 9.2: Statistical Graphics Procedures Guide, Second Edition

  11. Visualizing Your Data Histograms

  12. Visualizing Your Data Scatter Plots

  13. Understanding Your Data Correlation Matrices

  14. Contingency Tables • Tabulates the number of responses in each category • Helps to visualize the distribution of data • Use χ2 approximate test for independence Pearson's Chi-squared test data: tab X-squared = 0.7658, df = 2, p-value = 0.6819 Warning message: In chisq.test(tab) : Chi-squared approximation may be incorrect

  15. Analysis of Variance • Technique used to test the differences between groups • Always plot your data before doing analyses Call: aov(formula = resp_height ~ gender) Terms: gender Residuals Sum of Squares 297.744 588.567 Deg. of Freedom 1 39

  16. Regression • Actually a generalization of ANOVA • Again, always plot your data Call: lm(formula = exercise ~ dad_height) Residuals: Min 1Q Median 3Q Max -5.9866 -3.4205 -0.3236 2.6709 14.0949 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -7.8573 10.7968 -0.728 0.471 dad_height 0.1938 0.1546 1.253 0.218 Residual standard error: 4.381 on 37 degrees of freedom (8 observations deleted due to missingness) Multiple R-squared: 0.04073, Adjusted R-squared: 0.0148 F-statistic: 1.571 on 1 and 37 DF, p-value: 0.2179

  17. Other Useful Resources • A PowerPoint on more automated outlier detection techniques: • http://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdf • R Package ‘outliers’: • http://cran.r-project.org/web/packages/outliers/outliers.pdf • On multiple imputation: • http://sites.stat.psu.edu/~jls/mifaq.html#bayes

More Related