200 likes | 336 Views
The Modern Approach to Data Analysis. New school (Post-1970) NHST as usually implemented is used incorrectly 402 Citations Questioning the Indiscriminate Use of Null Hypothesis Significance Tests in Observational Studies
E N D
New school (Post-1970) NHST as usually implemented is used incorrectly 402 Citations Questioning the Indiscriminate Use of Null Hypothesis Significance Tests in Observational Studies Careful with NHST or not used; Using different techniques altogether when appropriate Tests not as robust as thought; always test assumptions and use robust approaches when necessary Bootstrap Resistant More accuracy and power with problematic data, about the same with ideal Desktop computers can do extremely complicated calculations in a blink of an eye Estimate missing values Reliability is a matter of concern Focus on a whole host of analytical information besides a p-value to draw conclusions More focus on effect sizes, interval estimates Using graphics in a meaningful way to communicate more information Model comparison Thoughtful exploration of data, not just confirmation Old school (Pre-1970) approach and not currently recommended by any methodologist Confused NHST approach No effect sizes, confidence intervals etc. Ignoring missing values or replace with mean Pay little attention to reliability Transformations in times of trouble or simply ignore violations of assumptions, even major Make every data situation conform to ANOVA for easier computation Almighty p-value The Situation
Dealing with problems Parametric tests can be used when assumptions are met or violations are “minimal” as they will usually have more power Though the non-parametric tests might be able to match that power under certain conditions and so are viable as a starting point Now try and define “minimal”. Non-parametric are typically used with small samples or major violations of assumptions, but can now be applied in any number of situations with minimal if any loss of power in ideal data circumstances, and more in less than perfect situations Parametric vs. Non-parametric
Common Rank tests • For the independent samples situation Mann-Whitney U (Mann-Whitney-Wilcoxon, Wilcoxon rank-sum test, or Wilcoxon-Mann-Whitney test); Wilcoxon for independent and dependent samples • Kruskal-Wallis, Friedman for more than 2 groups • Basic procedure • Rank the DV and get sums of the ranks for the groups • Construct a test statistic based on the ranked data • New approaches: Anova-type statistic • Advantage: • Normality not necessary (distribution free tests) • Insensitive to outliers • Better for small data sets • Disadvantage: • Ranked data is not in original units and so therefore may be less interpretable • May lack power, particularly when parametric assumptions hold
Transformation of data • So if I don’t like my data I just change it? • Think about what you’re studying • Is depression a function of Likert scale questions? • Is reaction time inherently related to learning? • Tukey: “reexpressions” • Our original numbers are already useful fictions, and if we think of them as such, transforming them into something else may not seem so far-fetched • Common Transformations • Logarithmic: Positively skewed • Square root: Count data • e.g. • Reciprocal (1/x): When there are very extreme outliers • Arcsine: Proportional data • e.g.
When to transform? • Not something to think about doing straight away at any little sign of trouble • Even if your groups are skewed in a similar manner parametric tests may hold • “Shop around” • Try different transformations to see if one works better for your problem regarding the distribution of values • Not just to get a sig p-value • However, note that transformations will not necessarily solve problems with outliers • Also, if inferences are based on e.g. the mean of the transformed data, we cannot simply transform the values back to the original and act as though the inferences still hold • In the end, we’d rather keep our data in original units and those transformations should be a last resort
Bootstrapping • Resample from the data to create an empirical sampling distribution rather than assume a theoretical one which may not be appropriate • About a 35 year old technique and at this point going beyond the basic approach
Resistant techniques • A conceptual understanding of a median can get one pretty far in understanding the underlying approaches • While some are mathematically complex, applied researchers are not required to do them by hand, and they typically can be pulled off with little or no extra effort • Applied researchers do not have to know all the ins and outs, just why they are useful and the underlying idea of resistance to outliers
Take home message, do graphs that are: A. Interesting B. Informative Simple bar and line graphs display very little information Case-level and distributional information is important to include, and adding a bit to a graph can go a long way in conveying more in points of interest1 Example for a simple two group comparison Graphical display
Reliability • The reliability of instruments used can make or break a study • Lowering acceptable standards does nothing but result missed effects and inaccurate conclusions about the constructs under study
Effect Sizes • All effect sizes are an expression of the effect (naturally) of one or more variables on one or more other variables • Some are specific to design (d for group comparisons, odds ratios for categorical outcomes) • All studies give enough information to calculate one • Good ones provide that information for you
Interval estimates • The uncertainty in our estimates is possibly the most important information to convey, and intervals for those estimates do so • Intervals for effect sizes are particularly wide, and explicitly help to illustrate the nature of sampling variability and temper bold claims based on little evidence (small data sets)
Alternative approaches • De-emphasize NHST • Non-parametric • Non-linear • Bayesian
Current state of psychology research • Consider the following: • Modern methods/approaches have been portrayed in easy to understand language for the applied researcher in a variety of well–known outlets (American Psychologist, Psychological Bulletin etc.) • They have already begun to appear in popular textbooks (e.g. Howell) • Modern methods have been made easily implemented in popular software (SAS, S-Plus/R, Stata) • APA-backed books, journals and reports confirm and are in full agreement with the perspective of methodologists espousing a change in our analytical approaches • There are disciplines within psychology willing to assist the applied researcher (Quantitative Psychology, Psychometrics) • Yet psychology in large part (though definitely not entirely) continues to ignore these methods, publish inaccurate results, and draw unsound conclusions from incorrect/poor analyses. • At some point this becomes an ethical issue: is the discipline as a whole ok with basing theory and practice on bad analysis? • But regardless, why might it be that psychology is not keeping up with the ‘analytical’ times?
Why? • It cannot be difficulty of concepts • E.g. A robust regression does not test a different theoretical model than ols • It cannot be difficulty of implementation • E.g. S-plus and Stata: drop down menu for robust regression, R: 1 line of code for robust regression. • It cannot be due to lack of access • E.g. SAS common on campuses, R is free ‘as in beer’ • It cannot be because these are brand new approaches that haven’t been tested • Though there are new ones developed all the time
Why? • It could be because applied researchers are not reading about these advancements • Unlikely given the journals they are published in are popular • It could be they are ignoring them and/or think it doesn’t matter • This would be odd given the mountains of simulation and empirical evidence provided over the decades that speak to the contrary • It could be because they use poor software that doesn’t implement such developments • Not even SPSS can be used as an excuse due to R plugin for version 16 on. • It could be because applied researchers and scientist-practitioners are not typically taught modern data analysis at any level • Very likely for most, and you can’t practice what you don’t know • “To put it simply, all of the hypothesis testing methods taught in a typical introductory statistics course, and routinely used by applied researchers, are obsolete; there are no exceptions. Hundreds of journal articles and several books point this out, and no published paper has given a counter argument as to why we should continue to be satisfied with standard statistical techniques.” • Wilcox, 2002. APS Observer • It could be that researchers simply won’t go the extra mile if journals don’t require it • It would be a poor excuse if that were the reason
Modern Data Analysis • It is simply a sad fact that in soft psychology theories rise and decline, come and go, more as a function of baffled boredom than anything else. • Meehl 1978 • An investigation of the distributional characteristics of 440 large-sample achievement and psychometric measures found all to be significantly nonnormal at the alpha .01 significance level. • Micceri (Psy Bull 1989) • It is as if the only concern about magnitude in much psychological research is with regard to the statistical test result and its accompanying p-value, not with regard to the phenomenon under study. • Cohen (Psy Bull 1992) • Hundreds of articles in statistical journals have pointed out that standard analysis of variance, Pearson product-moment correlations, and least squares regression can be highly misleading and can have relatively low power even under very small departures from normality • Wilcox (Amer Psych 1998) • Future historians of psychology will be puzzled by an odd ritual, camouflaged as the sine qua non of scientific method, that first appeared in the 1950s and was practiced in the field for the rest of the twentieth century. • Gigerenzer (Beh and Brain Sci 1998) • You should take efforts to assure that the underlying assumptions required for the analysis are reasonable given the data... Always present effect sizes for primary outcomes… Interval estimates should be given for any effect sizes involving principal outcomes. • APA task force (Amer Psych 1999) • To put it simply, all of the hypothesis testing methods taught in a typical introductory statistics course, and routinely used by applied researchers, are obsolete; there are no exceptions. • Wilcox (APS Observer 2002) • An improved quantitative science would emphasize the use of confidence intervals (CIs), and especially CIs for effect sizes. • Thompson (Ed Res 2002) • There is no longer any reason to report a squared multiple correlation, an ANOVA F statistic, or a focused contrast t test without providing information about confidence intervals on standardized effects. • Steiger (Psy Methods 2004) • Most researchers analyze data using outdated methods. We recommend that researchers bypass classic parametric statistics in favor of modern robust methods. • Erceg-Hurn & Mirosevich (Amer Psych 2008)
Doing Better Research • Obviously, there is no point to doing any research if one is not going to do it well. Poor research: • Impedes progress in the discipline • Makes us look bad as a discipline to those outside of it • Relegates our science to be done by other fields: neuroscience, linguistics, sociology, education etc. • Wastes all the time, money (typically taxpayer dollars), and manpower put into the information collection and evaluation process • Can have potentially serious consequences in education, the workforce, the clinic, personal relationships etc. when action is taken on the results of poor science • It could also be construed as the very sham inquiry we were talking about at the beginning of the semester. • Getting output in a stats program in and of itself does not and has not ever qualified as doing research or science, and certainly not good research • Good research requires a thoughtful approach, appropriate responses to each unique research problem, and doing the best you can with your resources.
It’s pretty easy to do modern analysis, and only gets easier with practice But it will have to be learned, and it will take a little more effort For example, while it easy to test assumptions, it will take more mouse clicks to do something rather than nothing Creating a good graph will take more effort than just going with the default Rely on those in the know, don’t feel you have to become expert You all are proof that the modern approach can be learned from the get go. You did great this semester! “Getting your hands dirty” and thinking for yourself is a lot more fun and interesting than letting a computer program tell you what effects are important If one doesn’t think so they shouldn’t be doing scientific research There is a lot of evidence that this change is occurring (albeit very slowly) and across a number of psychological sub-disciplines and psych-related fields This will only bode well for the discipline as a whole, both its science and the applications of that science Bright Side
Modern approaches do not solve every problem They do not necessarily result in ‘more significant’ findings They are still liable to misuse and abuse ‘Fancy’ stats are not even necessary if assumptions are met, though still are useful for comparison CIs and Effect sizes still are required though However, the little extra effort it takes to do them should make them common practice, and if that occurs, we can feel much more confident about the understanding we have of ourselves and how we interact with the world around us Modern Data Analysis