830 likes | 1.13k Views
By Chris Franck. LISA: Basic principles of experimental design. Collaboration: Visit our website to request personalized statistical advice and assistance with: Experimental Design • Data Analysis • Interpreting Results Grant Proposals • Software (R, SAS, JMP, SPSS...)
E N D
By Chris Franck LISA: Basic principles of experimental design
Collaboration: Visit our website to request personalized statistical advice and assistance with: Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...) LISA statistical collaborators aim to explain concepts in ways useful for your research. Great advice right now: Meet with LISA before collecting your data. Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics LISA also offers: Educational Short Courses:Designed to help graduate students apply statistics in their research Walk-In Consulting: M-F 12-2PM in 401 Hutcheson Hall for questions requiring <30 mins All services are FREE for VT researchers. We assist with research—not class projects or homework. www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis To request a collaboration meeting go to www.lisa.stat.vt.edu www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis To request a collaboration meeting go to www.lisa.stat.vt.edu • 1. Sign in to the website using your VT PID and password. • 2. Enter your information (email address, college, etc.) • 3. Describe your project (project title, research goals, • specific research questions, if you have already collected data, special requests, etc.) • 4. Wait 0-3 days, then contact the LISA collaborators • assigned to your project to schedule an initial meeting. www.lisa.stat.vt.edu
About LISA • Laboratory for Interdisciplinary Statistical Analysis • www.lisa.stat.vt.edu • FREE services: Collaboration, walk-in consulting, short courses • We are here to help you! Goal is to contribute to good research across the Virginia Tech community.
Chris Franck Assistant Director Eric Vance Director Tonya Pruitt Administrative Superstar
Structure of talk • PLEASE SIGN THE SIGN IN SHEET! • Cover fundamental aspects of experimental design. • Scenarios come from consulting experience. • Highlight LISA services available. • Key words in red • ?? Interesting questions.
Central messages If you face statistical uncertainty at any stage of your research, please come to LISA. Best time to involve the statisticians: Before the data has even been collected. Speaking of the pre-data collection phase…
Why are we here? • Choices made at the design stage have the potential to drastically impact the results of any study. • Good experimental design gives the researcher an improved chance of a successful experiment. • A poorly considered or implemented design can have a ruinous effect on the investigation.
The plan • We will discuss basic elements of experimental design: randomization, replication, and blocking. • Real world experiments. • Interpretation of experimental results will be compared and contrasted with interpretation of results from observational studies.
A note on examples presented. • Examples chosen present uniquechallenges and complications – chosen from hundreds of collaborations. • Magnitude of challenges in these examples is greater than what is typical for a LISA collaboration. • Worst case scenarios!
Study design: food science • Research question: Among three genetic varieties of sweet potatoes, which type will brown the least when fried? Also take storage time into account. • Measurement of browning done with a machine – beyond the scope of this talk. • The following graphic shows the design layout.
Features of the design • Suppose we conduct this experiment and conclude the third variety browns the most, and the first variety browns the least. • Is this necessarily due to the genetic differences in the potato types? • Is there another plausible explanation?
What about cooking order? • Notice that for a given week, all of the potatoes are cooked in the same oil. • Also, the varieties are always cooked in the same order, making the effect of variety and the effect of cook order inseparable. The effect of variety on browning is confounded with the effect cooking order has on browning.
Why randomize? • Randomization is a fundamental feature of good experimental design. • In this case, randomization will eliminate the known confound between cooking order and potato variety. • Randomization makes groups similar on average, and hence eliminates unknown confounding effects as well!
Sweet potato remarks • Since the effect of interest (genetic variety) is confounded with cooking order in the current experiment, recommendation is to repeat the experiment with a randomized design. • Many randomized designs exist! Perhaps changing oil more frequently can also improve the project.
Costly • In general randomization is not difficult to perform. • In this case the cost of repeating experiment in randomized fashion was moderate (about 12 weeks time + materials). • Repeating an experiment can be VERY costly. (3 years + materials - PhD research)
Another randomization example • A professor observes that students who sit in the front of a large lecture class tend to get better grades. Can she conclude that sitting in front causes students to get better grades?
HW problem • A grape researcher is interested in testing the effect of 4 pesticides on the disease rate on his grapes. For his experiment he has 16 total vines arranged in four plots. Each vine has a trunk at the center and two cordons extending from the trunk. Many grape clusters grow on each cordon.
How to assign pesticides? • To administer the pesticides, the researcher randomly assigns one pesticide (labeled A, B, C, and D) to each of the plots. He then sprays the assigned pesticide on all four vines in each plot, walking from north to south in each case. • Call pesticide treatment.
How many reps for each treatment? • A) 4 reps/treatment since there are four vines that receive each pesticide. • B)8 reps/pesticide since there are eight cordons that receive a given treatment. • C) Many replicates: depends on the number of grapes which grow, since each grape might or might not have the disease. • D) Something else.
Answer: • Number of experimental replicates: • Why!? What went wrong? • The experimental unit is the smallest unit in the experiment to which separate treatment assignments are made. • What was the experimental unit in this experiment?
Definition of replicate • The number of replicates for a given treatment is equal to the number of times the treatment was assigned to the experimental unit.
Consequences of this design • We cannot perform usual statistical inference in this experiment. That is, we cannot perform hypothesis tests, construct confidence intervals, etc. • The resulting data might suggest a difference in the treatments, but we can’t quantify the uncertainty of the results with confidence levels, p-values, etc.
Improvements • Instead of using 4 total plots, we might use 8, 12, 16, etc. This would give 2,3,4 replicates per treatment. • Instead of randomizing the treatments to the plots, perhaps we can randomize the treatments to the vines themselves?
Randomizing treatments to vines • Now the vine is the experimental unit. • 4 replicates for each treatment instead of 1. • ?? What if our treatment is sprayed on the vines in such a way that adjacent vines get a little bit of the wrong treatment? Windy day? • This is an example of a carryover effect – we can address these advanced issues at LISA collaboration meeting.
Another replication example • A researcher is developing a compound needed to create some advanced textiles. Four temperature levels are crossed with 4 molecular compounds. Each combination of temperature and compound is randomized to a flask, and five samples are taken from each flask for measurements. • Interaction between temperature and compound is of primary interest.
Umm, Chris, did you really randomize? • Plot 1 has three instances of D! • Not one single plot has each of the treatments! • What if plot 1 has ideal characteristics? (irrigation, soil quality, sunlight) • ?? Won’t treatment D seem better than it otherwise would since it appears in the best plot three times?
Yes, and Yes • I did randomize, using a completely randomized design. • Yes, if plot 1 has different characteristics than the other plots, and D appears frequently by chance in plot 1, then the three observations on treatment D are a function of both the treatment and the enhanced plot characteristics.
In general • If the plots 1,2,3 and 4 have are not identical in terms of the response (disease rate), then plot to plot or inter-plot variability is present. • Maybe some of the plots are on inclines, soil characteristics may be different, etc. • Call the impact of the different plots on the response the plot effect.
How do we handle the plot effect? • We are interested in the disease rates of grapes. We believe the various treatments will affect these disease rates. We also believe that the plots will also have an impact on the disease rates. • We don’t really care about the plot effect – our primary goal is to determine how the treatments affect the response. Plots are simply an extra source of variability
Treat the plots as blocks • Blocking is a strategy that may be implemented in order to account for known sources of variability in the experimental material. • In our case, the plots may show variability we wish to account for.
To implement blocking • Blocking is implemented during the design phase of the experiment. • We want to assign the treatments into the blocks so that each treatment appears in each block exactly one time. • Assigning treatments to blocks in this fashion is a form of restricted randomization.
Notes about RCBD • Notice each treatment appears in each block exactly once. • Statistical model for this design: yij = μ + αi + βj + εij i=1,…,a is the number of treatments (4) j=1,…,b is the number of blocks (4) yijis the response for the ith treatment, jth block.
More terms • μ is the overall mean. • αi is the treatment effect for the ith treatment. • βj is the effect of the jth block. • εij is an error term associated with response at ith treatment and jth block.
Take home message • By implementing the randomized complete block design, we can: • Compare the performance of the four treatments. • Account for variability in the plots that might otherwise obscure the treatment effects. • But suppose you did not randomize the treatments into blocks before collecting data. Can I still use above technique?
Can I? • Maybe, maybe not. • If you used a completely randomized design (as I did initially), you may not be able to fit the RCBD model! • This is because some of the parameters in the model may be non-estimable depending on how your randomization works oute.g. in completely randomized design, no information about how treatment A behaves in plot 1. • Come see LISA when designing experiments!
Row effects? • ?? RCBD seems good, but what if I also have a row effect in addition to a plot effect. • E.g. perhaps there is a fertility gradient within each plot. • Treatment C appears three times in the second row in RCBD plot. Don’t we have the same problem even with RCBD. • Answer:
Another blocking example– hemophilia project • Hemophilia refers to a set of genetic disorders which impairs an individual’s blood from clotting. • Hemophiliacs do not have the ability to produce certain proteins which are needed for blood clotting. • In our project we study Factor 9 (F.IX).