360 likes | 378 Views
Preliminaries Introduction to Statistical Investigations. Preliminaries. There are three parts to this preliminary section An overview of Statistics A quick look at some terms and concepts that are used with quantitative data
E N D
Preliminaries There are three parts to this preliminary section • An overview of Statistics • A quick look at some terms and concepts that are used with quantitative data • An exploration as to what probability is and how we will use it to help us understand Statistics
Statistics vs. Anecdotal Evidence Smoking causes cancer. Seat belts save lives.
Do Vaccines Cause Autism? • Nelson says it wasn't long after her son Parker's shots at 15 months that she noticed something was wrong. • "He had run a slight fever after the vaccinations, but I didn't think anything of it," said Nelson. “… about a week after that he just completely stopped talking.“ • After months of worrying, wondering, and going back and forth with doctors, an official diagnosis was made: autism. • "Gradually, I started piecing it together. He got sick after his vaccinations and about a week later everything changed. He was a completely different little boy then," said Nelson. • Post hoc ergo propter hoc (after this, therefore because of this) or simply called post hoc fallacy. • If a rooster crows immediately before sunrise, does the rooster cause the sun to rise? http://www.wsaz.com/charleston/headlines/19376044.html
Statistics • Scientific conclusions cannot be based on anecdotal evidence. We need evidence from data. • Statistics is the science of producing useful data to address a research question, analyzing the resulting data, and drawing appropriate conclusions from the data.
Example P.1: Organ Donations • While a majority of people approve of organ donation in principle, far less than that actually sign up when getting a driver’s license. • Different states have different recruiting methods. • Do these different methods result in different sign-up rates?
Recruiting Organ Donors Step 1. Ask a Research Question • In general: Is there a method that will increase the likelihood that a person agrees to become an organ donor. • More specifically: Does the default option presented to driver’s license applicants influence the likelihood of someone becoming an organ donor?
Recruiting Organ Donors Step 2:Design a study and collect data • The researchers decided to recruit various participants and ask them to pretend to apply for a new driver’s license. • The participants did not know in advance that different options were given for the donor question, or even that this issue was the main focus of the study.
Recruiting Organ Donors Step 2:Design a study and collect data • Some of the participants were forced to make a choice of becoming a donor or not, without being given a default option (the “neutral” group, Michigan’s current practice). • Other participants were told that the default option was not to be a donor but that they could choose to become a donor if they wished (the “opt-in” group, Michigan’s past practice). • The remaining participants were told that the default option was to be a donor but that they could choose not to become a donor if they wished (the “opt-out” group, some countries use this practice).
Recruiting Organ Donors Step 3:Explore the data. • 23 of 55 (41.8%) participants in the opt-in group agreed to become organ donors • 41 of 50 (82.0%) participants in the opt-out group agreed to become organ donors • 44 of the 56 (78.6%) participants in the neutral group agreed to become organ donors
Recruiting Organ Donors Step 4:Draw inferences beyond the data. • Using methods that you will learn in this course, the researchers analyzed whether the observed differences between the groups was large enough to indicate that the default option had a genuine effect. • In particular, they reported strong evidence that the neutral and opt-out versions do lead to a higher chance of agreeing to become a donor, as compared to the opt-in version currently used in many states. • In fact, they could be quite confident that the neutral version increases the chances that a person agrees to become a donor by between 20 and 54 percentage points, a difference large enough to save thousands of lives per year in the United States.
Recruiting Organ Donors Step 5: Formulate conclusions. • Based on the analysis of the data and the design of the study, the researchers concluded that the neutral version causes an increase in the proportion who agree to become donors over the opt-in. • But because the participants in the study were volunteers recruited from various general interest Internet bulletin boards, generalizing conclusions beyond these participants is only legitimate if they are representative of a larger group of people. (The authors believed their sample included a “broad range of demographics.”)
Recruiting Organ Donors Step 6:Look back and ahead. • One limitation of the study is that participants were asked to imagine how they would respond, which might not mirror how people would actually respond in such a situation. • A new study might look at people’s actual responses to questions about organ donation or could monitor donor rates for states that adopt a new policy. • Researchers could also examine whether presenting educational material on organ donation might increase people’s willingness to donate. • Another improvement would be to include participants from wider demographic groups than these volunteers.
Terminology • The individual entities on which data are recorded are called observational units. • The recorded characteristics of the observational units are the variablesof interest. • Variables can be: • Quantitative • You can add, subtract, etc. with the values. • Height, weight, distance, time… • Categorical • Labels for which arithmetic does not make sense. • Sex, ethnicity, eye color… • What are the observational units and variables in the Organ Donation Study?
More Terminology • The distribution of variable describes the pattern of value/category outcomes. • For the organ donation study the bar chart shown displays the distribution of responses.
Old Faithful (Webcam Link) Example P.2
Old Faithful • How faithful is Old Faithful? • Can the time of the next eruption be accurately predicted?
Old Faithful • Researchers collected data on 222 eruptions taken over a number of days in the summers of 1978 and 1979. • The results are shown in a dotplot.
Old Faithful • What are the observational units and variable in this study? • Is the variable quantitative or categorical? • We can see from the dotplot that Old Faithful is not perfectly predictable. • The time until the next eruption varies from eruption to eruption. • This variability is the most fundamental property in studying Statistics. Without variability, we wouldn’t need statistics.
Old Faithful • Let’s take another look at the dotplot and describe the distribution. • What could be some explanations for the variability?
Old Faithful • One explanation could be the duration of previous eruption (short: < 3.5 min. or long > 3.5 min.)
Old Faithful Summer 2005
Old Faithful • One way to measure the center of a distribution is with the average, also called the mean. • One way to measure variability is with the standard deviation, which is roughly the average distance between a data value in the distribution and the mean of the distribution • What is the standard deviation of the data set {7,7,7,7,7}? • Which data set has the largest standard deviation? • A {1, 3, 3, 3, 3, 3, 7} • B {1, 2, 3, 4, 5, 6, 7} • C {1, 1, 1, 4, 7, 7, 7}
Old Faithful Basic Terminology • Some aspects to look for in a distribution of a quantitative variable are: • Shape: Is the distribution symmetric? Mound-shaped? Are there several peaks or clusters? • Center: Where is the distribution centered? What is a typical value? • Variability: How spread out are the data? Are most within a certain range of values? • Unusual observations: Are there outliers that deviate markedly from the overall pattern of the other data values? Are there other unusual features in the distribution?
Cars or Goats • A popular TV game show, Let’s Make a Deal, featured a game where there were three doors. Behind one was a really nice prize and behind the other two were much less appealing prizes (like goats). • If a contestant picked the correct door, he or she would win the car. • What is the probability that the contestant would win the car under this scenario?
Cars or Goats • What does this probability of 1/3 mean? • If 3 contestants play, will exactly 1/3 of them win? • If 300 contestants play, will exactly 1/3 of them win? • If 3000 contestants play, will exactly 1/3 of them win?
Cars or Goats • The probability of an event is the long-run proportion of times the event would occur if the random process were repeated indefinitely (under identical conditions).
Cars or Goats • After a contestant picked a door, Monty would often open up a different door (to show a goat) and then allow the contestant to switch to the unopened door or keep the one that was chosen. • Should the contestant stick with the original door or switch? • What is the probability of winning if the contestant stays with the original? • What is the probability of winning if the contestant switches doors? • Let’s simulate this.
Cars or Goats • Simulation (artificially re-creating a random process) can be use to estimate a probability. • Using a larger number of repetitions in a simulation generally produces a better estimate of the probability than a smaller number of repetitions. • We will do many simulations in here both by hand (coins and cards) and with computers.
Learning Objectives for Preliminaries • Describe how the six steps of a statistical investigation apply to a particular statistical study. • Identify the observational units and variables in a statistical study. • Classify variables as categorical or quantitative. • Write a paragraph describing a distribution of a quantitative variable as presented in a dotplot, addressing aspects of shape, center, variability, and unusual observations.
Learning Objectives for Preliminaries • Compare centers, variability and shapes between two or more distributions displayed in dotplots. • Realize that long-term relative frequency can be used to estimate probability. • Understand that for a random processes with unknown probabilities, we can approximate probabilities with simulation (artificially re-creating the process a large number of times)