Statistics Micro Mini Threats to Your Experiment!

Statistics Micro MiniThreats to Your Experiment! January 7-11, 2008 Beth Ayers

Threats to Your Experiment • Here I am using threats to mean things that will reduce the impact, credibility, or generalizability of the results your experiment or study, particularly the things that we have control over

Threats to Your Experiment • Validity • Statistical conclusion validity • Internal validity • Construct validity • External validity • etc. • Type I Error • Power

A note of caution • Many of these ideas are inter-related • I expect some of this to be confusing at first, it will help to reread the notes once you’ve seen all the terminology

What is Validity? • In general, validity refers to the degree to which a study supports the intended conclusion drawn from the results • Validity is broken down into many different types • A Google search will yield many pages explaining different types • At a quick glance Wikipedia articles seem to have many of the types and are correct

(Statistical) Conclusion Validity • Conclusion validity is the degree to which conclusions we reach about our data are reasonable • In particular, is there a relationship between the two variables?

Threats to Conclusion Validity • Concluding there is a relationship when in fact there is not one – Type I Error • Concluding there is not a relationship when in fact there is one – Type II Error • Low statistical power • Violation of assumptions • Fishing for results

Controlling Conclusion Validity Threats • When testing multiple hypotheses, adjust the error rate, see slide 20 • Take reliable measurements • Increase power (see later definitions) • Use the right statistical test and perform it correctly!!!

Internal Validity • Internal validity is the degree to which we can conclude that the changes in the explanatory variable caused the changes in the response variable • Remember • Association does not imply causation!

Threats to Internal Validity • Confounding – in the form of a lurking variable • Selection Bias – before treatment there are differences between the groups • History – some historical event affects the study outcome • Maturation – if making observations at different times points, the natural aging of the subjects may invalidate results • Instrument change – not using the same measurement tools or methods with each subject • Repeated testing – subjects may remember answers between testing sessions • Experimenter bias – the experimenter treats subjects differently based on treatment type

Controlling Internal Validity Threats • Random assignment of treatment or randomization • Blinding – subjects don’t know which treatment they are receiving • Double blinding – neither subjects nor experimenters know which treatment subjects are receiving

Construct Validity • Construct validity refers to the degree to which inferences can be made from the measurements in your study • In particular, are you manipulating what you claim you are manipulating (the causal construct) and are you measuring what you claim you are measuring (the effect construct) • Often defined with multiple subcategories of validity

Threats to Construct Validity • Poor study design • Using new or unreliable methods of measurement • Measurements depend on who is collecting them • Interaction of testing and treatment • Interaction of treatments

Controlling Construct Validity Threats • Carefully design your study • Have others critique your design • Measurements should be reliable and not depend on who is administering your test • If there is a “gold standard” of measurement for your response variable, your measure should have a high correlation with that test

External Validity • External validity is the extent to which we can generalize the findings to particular target persons, places, or times and to which we can generalize across types of persons, places, or times • Often simply called generalizability

Threats to External Validity • The particular group of people used in your study • The particular place where you performed your study • The particular time at which you performed your study

Controlling External Validity Threats • Make sure to get a representative sample of the population • Do your study in a variety of places, with different people, and at different times

Validity Questions Are Cumulative • Conclusion – is there a relationship between the explanatory and response variables? • Internal – is the relationship causal? • Construct – can we generalize to the construct? • External – can we generalize to other persons, places, times?

Type I Error • Although this was mentioned in conclusion validity, it is worth discussing again • Due to the inherent uncertainties of nature, we can never make definite claims from our experiments

Type I Error • However, we can set a limit on now often we will make a false claim • Done by setting the ®-level of a test • Do not “data snoop” • Performing many tests on your data until you find a significant result • Correct for multiple testing

Correcting for Multiple Testing • When performing hypothesis tests, 100*® will result in Type I errors • When doing many tests, need to lower the error rate • Several common methods are: • Bonferroni correction • Benjamini-Hochberg method • False discovery rate

Statistical Power • The power of an experiment refers to the probability that we will correctly conclude that the treatment caused change in the outcome given that it actually does • Low power leads to Type II Errors • Have some control of the power of your experiments prior to running them • Performing experiments with low power is a waste of time and money!

Statistical Power • The power varies from ® to 1 • Typically people agree that 80% power is a minimal value for good power • Poor power is a common problem. It can NOT be fixed with statistical analysis. It must be dealt with before running your experiment.

Statistical Power • Statistical power depends on • the ®-level of the test • the size of the difference or the strength of the similarity (that is, the effect size) in the population • the sensitivity or variability of the data • More power to detect larger effects than smaller ones

Ways to Increase Power • Increase sample size • Reduce variability • Increase the spacing between population means

Increasing Sample Size • Unfortunately, this is not always an option • Not always cost effective

Reducing Variability • Types of variability • Measurement • Environment • Treatment application • Subject-to-subject • There is a trade-off between reduced variability and external validity • If we to do much to control the variability, we lose generalizability

What can power calculations do? • Suppose we’re creating an on-line tutor to help students study for the SATs. Since we’d like to sell the package for $200, we believe that we’ll need a 100 point increase in SAT scores in order for parents to consider the tutor worthwhile. • Given a sample size, an ®-level, and an estimate of the variability of SAT scores, we can calculate the power of our experiment • Given a power we’d like to have, an ®-level, and an estimate of the variability of SAT scores, we can calculate the sample size needed to obtain that power

Notes on Power • Calculating power can be hard… • However, there are many on-line applets to help you calculate power! • As we go through examples I will show you one such applet • Find one you understand and can use!

Statistics Micro Mini Threats to Your Experiment!