1 / 47

Experimental Design - The basics

Experimental Design - The basics. Richard Preziosi. How to formulate hypotheses. Where do you start? What is a hypothesis? Stating a hypothesis Generating predictions Statistical hypotheses (different!) Only after completing this process will you be able to decide what data to collect.

albert
Download Presentation

Experimental Design - The basics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experimental Design - The basics Richard Preziosi

  2. How to formulate hypotheses Where do you start? What is a hypothesis? Stating a hypothesis Generating predictions Statistical hypotheses (different!) Only after completing this process will you be able to decide what data to collect

  3. Hypotheses: Where do you start? • Start by stating your research question • E.g. ‘Why are male and female humans different sizes?’ • Your question may easily produce more than one hypothesis, that’s fine.

  4. Hypothesis: • A hypothesis is a clear statement articulating a plausible candidate explanation for observations • It should be constructed in such a way as to allow gathering of data that can be used to either refute or support the candidate explanation

  5. Stating a Hypothesis: • Phrase your hypothesis as a possible answer to your research question. • E.g. ‘Male and female size differ because males grow faster than females’

  6. Generating predictions • These are the testable statements that follow logically from your hypothesis • E.g. ‘males have a faster growth rate than females’

  7. Statistical hypotheses • Predictions should lead you to testable statistical hypotheses • Note that the hypothesis of interest in statistics is the one where nothing is different (the null hypothesis) • A clearly stated null hypothesis will generally lead you to the correct statistical test • E.g. ‘There is no difference in the growth rate of males and females’

  8. Question Hypothesis Predictions Statistical (Null) hypothesis

  9. Pitfalls of generating predictions • Weak tests • Indirect measures • Non-useful outcomes • Your tests must satisfy the devil’s advocate (e.g. reviewers or examiners)

  10. Weak test • Consider the hypothesis: Students enjoy the course in radiation training more than the workshop in experimental design • Prediction: Students will get better grades in radiation training than in experimental design • This is a weak test (prediction) because other explanations are equally likely AND because we have used an indirect measure (grades as a measure of enjoyment)

  11. Non-useful outcomes • These are hypotheses that may well prove interesting if true but are uninformative if false

  12. Satisfying skeptics • Reviewers will look for logical flaws in your experiments. You do not want to finish your paper with: • ‘My results indicate that mechanism A determines apoptosis rates. Although mechanism B could also produce the same response I believe that mechanism A is the important one’ • This will earn you a review of the form • ‘This study provides no clear evidence to distinguish between mechanisms A and B. The authors need to redesign their study and start again. Recommendation, reject this manuscript’

  13. Pilot Studies and Preliminary Data • May be observational or mini-experiments • Ensures sensible questions • Can you observe the phenomenon? • Practice and validate techniques • Minimize training effects of data • Recognize logistic constraints • Standardization across observers • Allows tuning of design and statistics • Assessment of sample sizes (power) • Test run of statistical analysis

  14. Experimental ManipulationVs. Natural Variation • In Manipulation studies you change an aspect of the system and measure effects on traits of interest (majority of lab studies and Agricultural studies) • In Correlational studies you measure associations between traits of interest (often assuming one is influencing the other) (Many Environmental and most Human studies) • Consider the hypothesis ‘Long tail streamers seen in many species of birds have evolved to make males more attractive to females’

  15. Correlational study usingNatural Variation • In the bird tail length example we could • Measure the tails of males at the beginning of the breeding season • Observe the number of matings each male has • Do statistics to determine if there is a relationship between tail length and number of matings • Results showing a relationship would support our hypothesis • Results not showing a relationship would go against our hypothesis

  16. Manipulative study • In the bird tail length example we could have 4 groups of birds • Results showing males with artificially long tails had more mates supports our hypothesis • Results showing males with reduced tails had fewer mates also supports our hypothesis • A comparison of group 1 males with the unmanipulated males acts as a control comparison

  17. Arguments for correlational studies • Often less work (but larger sample sizes usually needed) • Deals with real levels of biological variation (manipulations may take things outside naturally occurring limits) • Requires less handling of organisms (important if there are constraints like stress to animals or endangered species) • Manipulative studies may produce unintended effects (e.g. flight ability in example or epistatic effects in knockouts) • Manipulation may not be possible • May provide a baseline study manipulative expts.

  18. Arguments for manipulative studies(really, against correlational studies) • Third variables • Reverse causation • These can be BIG problems if they occur

  19. Third Variables • Third variables occur when there is an apparent link between A and B but in fact there is no direct link or mechanism. Instead both A and B depend on C, the third variable. • This means that patterns in correlations studies are just that, correlations. • Remember, correlation does not imply causation

  20. Third Variables - an example • In the bird tail length example lets say that we do see a correlation between tail length and number of mates • Suppose that females are actually attracted to territories not males, but that males on better territories can grow larger tails • The third variable here is territory quality and it drives both tail length and number of mates and produces an ‘apparent’ relationship.

  21. Third Variables - Two famous examples • Fisher suggested that the link between smoking and cancer was correlational not causative and that another factor, perhaps stress, led people both to smoke and develop cancer. • Fewer women postgrads marry than women in the population as a whole. This relationship is presumable due to some other correlated factor (third variable)

  22. Reverse causation • This occurs when it is assumed that ‘correlation implies causation’ • In some cases this can be ruled out based on other data or common sense • In the bird example it is unlikely that the number of mates for a male has any effect on tail length measured at the start of the mating season.

  23. Reverse causation - a famous example • There is a correlation between the number of storks nesting in chimneys and the number of children in a house (old data from Holland) • Although storks bringing babies makes a nice story the causation is likely reversed • Larger families tend to live in larger houses with more chimneys, and hence more opportunities for storks to nest.

  24. Variation, replication and sampling • Variation among individuals • Replication and the experimental unit • Pseudoreplication

  25. Variation among individuals • Variation among individuals is a given for most biological systems • In any experiment we are concerned with variation in the Response or Dependent Variable • Variation in the response variable can be divided into; • Variation explained by experimental factors (IV) • Variation not explained by experimental factors (AKA error variation, random variation noise) • In most studies we are interested in reducing noise and, hopefully, increasing explained variation

  26. Variation among individuals • Single measurements from each treatment do not allow us to distinguish between noise and effect • make sure you have a sufficient number of individuals that experience the same manipulation • These individuals that receive the same manipulation are called replicates • ‘What is the experimental unit?’

  27. Pseudoreplication • This occurs when there is confusion between treatments, replicates and blocks. • Consider an experiment comparing the effect of a toxicant on fish behaviour. • Lets say the toxicant is prepared in a batch and drip fed into the treatments tanks (water is drip fed into the control tanks) • Are the replicates; • Each fish in a tank? • Each tank? • Each set of tanks on a common drip? • Each batch of toxicant? • Don’t expect a simple answer, the answer is in the biology, not in statistics

  28. Common sources of Pseudoreplication • Shared enclosures • Common environments • Relatedness • Pseudoreplicated stimulus • Non-independence of group behaviour • Pseudoreplicated measurements over time • Species comparisons • Sometimes pseudoreplication is unavoidable

  29. Random sampling • Proper random sampling means that each individual has an equal chance of being allocated to each treatment group • The problem with non-random treatment of samples is that any bias in assignment of individuals or systematic pattern to ‘errors’ may bias your results • True random samples almost always require the use of computers or random number tables

  30. Random assignment and treatment • Random means not only random assignment but also random treatment • Lets say that you are examining the effect of rhizosphere bacteria on plant growth. • Not only should each plant have an equal opportunity of being assigned to the bacterial or non-bacterial (control) group all other aspects of the process should be random as well. • Plants should be planted in equivalent compost (possibly in random order) • Plants should be randomly allocated to growth chambers and perhaps positions in chambers

  31. Haphazard sampling • Haphazard does not mean Random • A haphazard sample is based on personal assignment by the experimenter in a fashion that they believe is random • Often severely biased even if the experimenter is consciously trying to take a random sample • Consider trying to ‘randomly’ select mice from a bucket or ‘randomly’ pippetting out aliquots of a cell culture • True random samples usually involve setting up experimental units BEFORE assigning treatments • BUT this is not always possible, use common sense (or blind assignment)

  32. Self selection • This is a real problem with survey or poll data • The subset of a population that respond to surveys is rarely a random sample and thus may bias your results • By all means use surveys to inform your research BUT be very suspicious of anything but general conclusions

  33. Pitfalls of Random Sampling • Make sure that the randomization procedure you use does what you intend • Randomise the order of collecting data - learning effects • Random samples Vs. Representative samples - don’t let computers do your thinking for you

  34. Sample size - how many replicates • Too few replicates can be a disaster - too many can be a crime! • Always use educated guesswork - i.e. look at similar experiments by previous workers and determine what worked. • Pay attention to differences between the studies • Formal power analysis - do if possible!!! • Requires that you have some guess of variation among replicates • Requires that you have an idea of how big of a treatment effect you can expect (or require) • Requires that you know what statistical test you will use

  35. Sample size - Resource Equation Model • Can be used for complex studies or when variation among individuals is unknown • Only appropriate for quantitative data • Gives conservative estimates of sample size so more appropriate for: • Large effect size (e.g. lab rather than clinical) • Testing for significant effects rather than estimating parameters • E = N - T - B • N is the total number of individuals -1 • T is the number of treatments -1 • B is the number of blocks -1 • E is the error df and should be between 10 and 20 • In some cases E should be larger (see Festing et al.)

  36. Sample size optimization (Festing et al.)

  37. Controls • This is the reference against which the results of an experimental manipulation can be compared • Thus your control group should be identical to your treatment group in everything except the treatment itself • Simple concept, common mistake • If the predictions and statistical hypotheses have been constructed well then the control group will be obvious • Lack of a control group makes an experiment pointless

  38. Types of Controls • Negative control - unmanipulated • Positive control - manipulated but not treated (vehicle control, sham procedure control) • Concurrent control - run at the same time as the treatment group • Historic control - based on previous data (be certain that individuals are identical except for the treatment)

  39. Blind Procedures • Designed to remove the perception that unconscious bias might taint results • Particularly useful when response variables are measured in a subjective way • Blind Procedure - person measuring has does not know what treatment has been applied • Double Blind - Both the subject and the person measuring does not know the treatment assigned (human studies)

  40. When controls are not needed (or allowed) • In medical or veterinary studies controls may be an ethical issue, Historical controls can be used but give careful consideration to criticisms • When sets of treatments are being compared (e.g. effect of two drugs on rat behaviour)

  41. Factorial experiments • 2 group comparison (t-test) design • Treatment and control compared • 1 factor design • Control and several levels of treatment compared • 2 factor design • More than one treatment considered simultaneously • Allows estimation of both main effects AND the interaction between them

  42. Main effects and interactions Food Strain Interact X - - X X - X X X X X X X X X

  43. Main effects and interactions

  44. Completely randomized designs Vs. Blocking • Completely Randomized designs are usually simple • Completely Randomized designs assume small among individual variation • If among individual variation can be attributed to a known factor then you can BLOCK by that factor, reduce error variation and increase your signal to noise ratio (=clearer results)

  45. Advantages of blocking

  46. Advantages of blocking • Blocking is commonly used to remove effects of • Space • Time • Individual characters that can be ranked • Continuous characters that effect among individual variation can be used as covariates to remove effects and improve signal to noise ratio

  47. The most common design errors • Ad hoc designs • Inappropriate control/treatment groups • Sample sizes too large or too small • Failure to use blocking • Lab animal studies: failure to use isogenic strains when GxE unimportant

More Related