270 likes | 278 Views
This lecture covers the properties of well-designed studies, including the definition and examples of control groups, methods for creating control groups, and concepts of blinding, group stratification, and group randomization. It also discusses strategies for minimizing bias in experiment design and the issue of publication bias. By the end of the lecture, you should be able to understand and apply these concepts in your own research.
E N D
Learning Objectives By the end of this lecture, you should be able to: • Define ‘control group’ • Contrast with ‘experimental group’ • Give examples of different methods for creating a control group • Define blinding, group stratification, group randomization • Speculate on possible stratifications for optimal study design As with the previous lecture, this too is not a “numbers” oriented lecture. It’s not difficult, but there is a little bit of terminology involved. You have only ‘gotten it’ when you can describe these terms in your own words with examples. It may take a couple of reviews.
Minimizing Bias in Experiment Design How can we reduce bias? A major objective in research. • Control group: Any good experiment should include a control group. • Blinding: When the subjects and ideally, the researchers as well, do NOT know which individuals received the ‘treatment’ and which individuals were in the control group until the experiment is completed. • Randomize the groups: We will see that it is frequently necessary to stratify your subjects into groups (beyond just the experimental and control groups). This stratification should be done randomly. • The ideal experiment: • A “randomized, double-blind, controlled” trial.
Assume that all studies are biased - it’s just a matter of degree. A reputable journal will only publish studies that demonstrate a significant effort to minimize bias. Publication Bias: And while we’re at it, another bias that has been noted in research fields is publication bias, aka “reporting bias” in which results that are not what the researcher hopes to find are simply not submitted for publication. One example might be a pharmaceutical firm that spends tremendous money and resources on a study to prove the efficacy of a promising new (and very expensive) drug – only to discover at the end that this new drug is no better (or even worse) than the cheaper competitor. They then simply opt not to submit the study for publication. Solutions to this problem? A “research registry” in which journals agree to only publish studies in which the data and/or specific details of the study are provided as the study is progressing.
Comparative Treatment Groups(a.k.a. Control Groups) Experiments are comparative in nature: For example, we might compare the response to a new drug treatment to: • Older / original treatment • No treatment • A placebo • Any combination of the above All of the above comparative treatments are known as controls. A control is a group to which an experimental treatment is NOT administered. It serves as a reference mark for comparison (e.g., a group of subjects that do not receive the “new” drug, or a group of subjects that is given a placebo). A placebo is a fake treatment, such as a sugar pill. This is to test the hypothesis that the response is due to the actual treatment and not to the subject’s belief that they were treated. In many studies, the control group is given a placebo. Without a control group, you should be very, very skeptical about any conclusions drawn as a result of the experiment!
Control Group • Any proper study will always discuss the “controls”. The “control group” refers to the group that was used as a comparison with the “treatment” group. • Key Point: Without a control group, you should be very, very skeptical about any conclusions that come out of the experiment! Example of experimental and control groups: Suppose you are a pharmaceutical company that has come up with what you believe is a breakthrough drug for diabetes. • Experimental Group: In your study, you will give one group your new wonder-drug. This group is called the experimental group. • Control Group: For comparison, any decent study will include a control group. Examples of control groups: • Give this group a placebo (perhaps the most common ‘control’) • Give this group the “older” version of the drug • Give this group NO drug (however you then sacrifice ‘blinding’)
Control Group - Examples • One Control group • 3 Experimental groups Example of how control and treatment groups are often graphed together to highlight differences or lack of differences.
Placebos Can exist in many forms: • In a drug trial, the placebo might be a completely inert drug that looks exactly like the experimental drug and is administered in the same way. • In studies evaluating acupuncture, a great choice for placebo was a needle that felt exactly like an acupuncture needle, but did not actually penetrate the skin. • In a study involving prayer, the experimental group was prayed for, while the placebo group was only told that they were being prayed for.
Blinding Blinded: If the patient doesn’t know if they are in the experimental group or in the control group, the study is said to be ‘blinded’. Double-Blinded: When both the subject AND the people involved in carrying out the experiment (e.g. researcher, nurses, etc.) don’t know who is in the control group and who is in the experimental group. Double-blinded studies are much more ideal than single-blinded studies. Example: In clinical drug trials, a patient is sometimes given a bar-code which they wear on a wristband. The medications also are not labeled, and also have a bar-code. The researcher/nurse giving the medication will scan the wristband and match it with an appropriate medication bar-code. So neither the patient nor the researcher knows if they are getting the treatment or the placebo/control. Only at the end of the study will they patients and researchers find out who was in the “experimental group” and who was in the “control group”.
Designing “controlled” experiments Sir Ronald Fisher—The “father of statistics”—was sent to Rothamsted Agricultural Station in the United Kingdom to evaluate the success of various fertilizer treatments. Fisher found that the data from experiments that had been going on for decades was basically worthless because of poor experimental design. • Fertilizer had been applied to a field one year and not the following year, in order to compare the yield of grain produced with v.s. without the fertilizer. • What are the flaws in this research methodology? • It may have rained more or been sunnier during different years. • The seeds used may have differed between years as well. • In one case, fertilizer was applied to one field and not applied to a nearby field in the same year. • BUT: • The two fields might have had different soil, sun exposure, water, drainage, and farming history (that is, the two fields may have been farmed differently in previous years). • In other words, many factors affecting the results were “uncontrolled.” Any suggestions for a valid control group?
Setting up ‘controls’ • In this particular experiment, you’d like to “control for” the various confounding variables that exist in this experiment: • Different soil • Different sun exposure • Different water drainage • Different farming patterns • etc. (it would be possible to come up with several others) • Fisher came up with a very clever experiment design that did a terrific job of “controlling for” the confounding variables.
Fisher’s (elegant!) solution: • In the same field and same year, apply fertilizer to randomly spaced plots within the field. Analyze plants from similarly treated plots together. • This was a great solution! Both the experimental group (the fertilized areas) and the control group (the non-fertilized areas) were exposed to the same sunlight, weather, drainage, farming patterns, etc. • Note how in this experiment there is: • A control group: The areas that were not fertilized • Randomization: The plots were randomized to either the fertilizer group or the control group.
Randomization Recall how with samples, we randomize so that no one group is over-represented. Similarly, when we place subjects into an experimental or control group, we are careful to do so randomly. (We don’t put our buddy in the control group to make sure “he gets the good stuff.”)! Key Point: All decent studies will randomize which subjects are in the control group vs which are in the experimental group. For example, if you are comparing a new cancer treatment vs the ‘older’ treatment, which patients get the new treatment and which get the older treatment must be decided at random.
Completely randomized designs Completely randomized experimental designs: Individuals are randomly assigned to groups, then the groups are randomly assigned to treatments. Which of the two groups is the control group? Group 1 is the “experimental group” Group 2 is the “control group”
Summary of some key principles of experimental design • Bias: Minimize the bias in the collection of data and the way in which the experiment is designed. • Control the effects of lurking variables on the response, by comparing the treatment you are interested in with a second group who either receives a placebo, or a different treatment. • Randomize – use some kind of randomization technique to assign subjects to treatments – in other words, the researcher does not pick who goes in the treatment group and who goes in the control group. • Blind: This is another major factor – particularly in medical trials. Neither the experimenter nor the subjects should be aware which subjects are receiving the experimental treatment and which subjects are receiving the control treatment.
Stratification Individuals (or observations) in a study must be properly stratified (grouped) to try and ensure that no one batch of people/observations is over-represented in the control group or in any of the experimental group(s). Example: Testing a new cancer treatment v.s. the old treatment: • Both treatments must be given to patients with similar severity of disease. So you might stratify based on the stage of the disease. • You might suspect that people of different ethnic groups (specifically Northern European ancestry) will respond differently to your medication. So you might stratify based on those from Northern European ancestry and those that are not. • etc. Example: Suppose you suspect that men and women would respond differently to the treatment. What is one change you should make to your study? • Answer: try to ensure that you place about equals numbers of men in each group (control group and each experimental group). Do the same with the women. This process of organizing your subjects into various blocks according to certain categories (age, race, severity of illness, etc., etc.) is called stratification.
Block aka “stratified” designs In a block,orstratified, design, subjects are divided into groups, or blocks, prior to experiments, to test hypotheses (i.e. theories) about differences between the groups. You can stratify based on the treatment, but you can also stratify based on the subjects (e.g. different ages, different races, different stages of disease, etc.). For example, suppose you are evaluating three different acne treatments on a group of teenagers between 14 and 16 years old. You would want to randomize into a minimum of four groups (one group for each treatment, and the control group) Can you spot a potentially major flaw in this study? Gender! At this age, there are all kinds of hormonal changes affecting teenagers, and they affect acne production differently in males vs females differently. So you would want to stratify based on gender as well. As a result, in order to do this study properly, we would need eight groups! Boys: 3 treatments + control. Girls: 3 treatments + control.
Stratifying into two blocks of three groups We divide the subjects are into groups, or blocks, prior to the experiments. This allows us to test hypotheses about differences between the groups. (Note: There also must be a fourth group for each block, the control. However, it is not shown in this diagram).
To stratify, or not to stratify… A researcher wishes examine the relationship of resting pulse rates and age. A sample of 52 people had their pulse rate measured at rest in the lab. Would you stratify? • Answer: Yes. Fitness Level: People who do lots of endurance sports typically have lower resting rates. Similarly gender: Men and women typically have different resting pulse rates, so this experiment should also be stratified by gender. • A researcher wants to determine if BST, a hormone intended to spur greater milk production works as advertised. A farming research facility makes available 60 cattle. Can you think of possible stratifications you might need? • Answer: Different breeds of cattle may respond differently to this hormone. As a result, you should consider stratifying by breed.
Weaknesses in experimental design • There is no such thing as the perfect experiment. • When hearing about a study, it is up to you to decide whether any of the limitations in the design are significant enough to limit the validity of the conclusions. • Unfortunately, outside of reputable journals, badly designed experiments are extremely common . • Which is not to say that “reputable” journals do not also allow shoddy research to slip through at times – it most certainly does happen!
Example of a randomized, double-blind controlled trial A major cancer center is excited to hear about a promising new treatment for pancreatic cancer. So: • They contact all of the patients in their files with this condition. • They find 408 patients who agree to be in their trial. • They exclude from their trial 11 patients who say they moving out of state since that group cannot be monitored by the center. • They exclude 43 others from the trial because they have other significant medical illnesses which would be confounding • Stratification: Now they have 354 patients remaining. They suspect that men and women will respond differently to the drug. They also suspect that people will respond differently based on their age. So they stratify based on both of these variables. • Gender: 190 are female and 164 are male. • Age: They use the age groups: 20-40 / 40-60 / 60-80 • Randomization and Control: Among each of these 6 groups (the 3 age groups, each of which is also stratified by gender), the patients are randomly assigned to receive either the usual treatment (the control group) vs the new treatment (the experimental group). We now have 12 different groups! But that’s okay, provided that each group is of a reasonable size. • Blinding: The researchers set up the study to be double-blinded. That is, neither the patients nor the physicians know which patient is receiving which treatment. They will not find out until the study has been completed. • Very good! Yet, there are still some flaws in the design of this study…
Limitations/Flaws in the pancreatic cancer study? • Stage of cancer – Drugs will affect the cancer differently depending on how advanced the disease is when the treatment begins. • Choice of age groups – The choice seems a bit arbitrary. • Lack of placebo control – It’s always great to have a placebo group as one of your controls, but often, you can not. In this case, there are ethical constraints. Ethics: Why couldn’t we use a placebo as the control? It would not be ethical to take patients with cancer and randomly give one block of them no treatment at all just for the purpose of improving the validity of your experiment.
Thoughts? • Survey: Obtained 36,000 physician office fax numbers, delivered ~16,000 faxes and received ~700 replies. Their respondents were mostly private practice physicians, and mostly mid-career. .” (Source: http://www.dpmafoundation.org/physician-attitudes-on-medicine.html). • The Doctor Patient Medical Association (DPMA) and the Patient Power Alliance (PPA) work to repeal health care reform and call themselves a "a nonpartisan association of doctors and patients dedicated to preserving free choice in medicine." The organization is a member of the National Tea Party Federation and the "American Grassroots Coalition • Note which magazine published this article - hardly a fly-by-night magazine! • I.e. Even legitimate magazines and news sources are frequently guilty of publishing “studies” and polls that are so riddled with flaws as to be essentially worthless in terms of their validity.
Example – Claudication Study (on web page) • Methods: first thing they mention is IRB approval; Randomized; Design: 3 groups; Location (Northwestern) • Inclusion & Exclusion Criteria: defining the population • Measurement: How they measured the results – sometimes straight-forward, sometimes can be a huge and contentious issue. How do you measure pain symptoms? How do you measure improvement? • Blinding: Obviously could not be double-blinded since patients knew their ‘treatment’. However, researchers were blinded. They just saw the data results. They did not know which patients were in which group as the experiment was going on. • Details: Many other issues and techniques employed by the study are explained in careful detail. • Stratifications (Blocks): Claudication vs No Claudication. • Control group: Nutritional consulting, regular meetings with data-gathering team, etc., but NO exercise. • Outcomes: In particular note the very frequent mention of p-values, and confidence intervals. Very important and we will be learning about them. • Charts and graphs: • p159: Breakdown of stratifications. Also note the ‘exclusion’ disclaimer at the bottom of the graph. If you’re going to leave people out of your analysis, you’d better explain why. In this case, 4 were left out in the end because they did not respond to following up. • Table 1, p.170: A careful breakdown and description of the people in each strata (block) • Conclusion: A study should at some point summarize the researchers’ recommendations on what the study can tell us. In this study it is in the very last paragraph: “Physicians should recommend supervised treadmill exercise programs for PAD patients regardless of whether they have classic symptoms of intermittent claudication”.