220 likes | 240 Views
Dive into the world of statistics to learn about data collection methods, sample biases, confounding variables, experimental design, and interpreting results. Explore real-life examples of good and bad samples and understand the difference between populations and samples.
E N D
Where do data come from and Why we don’t (always) trust statisticians.
Induction vs. Deduction the gist of statistics • Deduction: “What is true about the whole, must be true about a part.” • Induction: “What is true about the part might be true about the whole.”
Population vs. Sample • Population is the entire group of individuals about which we want information. • Sample is a part of population from which we actually collect information. • We use samples to study population because, often, populations are impossible or impractical to study.
Real Life Example of a Bad Sample • Ann Landers, a famous columnist, collected a sample of 10,000 people who wrote in to answer this question: “If you could do it all over again, would you have children?” • 70% of the respondents said that they would not have children. • When a sample was selected at random, 91% of the people said that they would have children.
Potential problems with sample surveys • Undercoverage occurs when some groups in population are left out of the process of choosing the sample. • Nonresponse occurs when an individual chosen for the sample cannot be contacted or refuses to respond.
Another Real life Example of a Bad Sample • In 1936 Literary Digest mailed out 10,000,000 ballots asking who the respondents are going to vote for – A. Landon or F.D. Roosevelt. • 2,300,000 ballots were returned, predicting a strong win (57%) for Landon.
Another Real life Example of a Bad Sample • George Gallup surveyed 50,000 people chosen randomly. • Comparison of forecasts: Gallup’s Prediction for Roosevelt 56% Gallup’s prediction of Digest 44% Digest prediction for Roosevelt 43% Actual vote 62% • Literary Digest used their subscription list, phone directory, lists of car owners, club members.
Right and Wrong Ways to Sample • A simple random sample is a sample where (1) each unit of population has an equal chance of being chosen and (2) all units are chosen independently. • The sample is biased if at least one group of individuals has greater chances of being selected.
Example of a good sample • You want to study effects of computers on GPA. You don’t have the resources to study all students. • To select a sample of students for the study you • Get a list of all students, • Select at random students on the list, • Collect information from the students selected, • Compare those who have computer with those who don’t.
Example of a bad sample • You want to study effects of computers on GPA. You don’t have the resources to study all students. • To select a sample of students for the study you • Use your friends. • Hang an ad in the computer lab. • Post an on-line questionnaire on WKU site.
Stratified Random Sample • When we know proportions of each group in the population – Stratified random sample is better than SRS. • In stratified sample, number of people chosen from each group is proportional to the size of that group in the population.
Confounding • Two explanatory variables are confounded when their effects on the response variable cannot be distinguished from each other. • Confounding is often a problem with a study that uses sample surveys to collect data (even if sampling is done right).
Observation vs. Experiment • Observational study - observes individuals and measures variables but does not attempt to influence responses. • Experiment imposes treatment on individuals to observe their responses.
How to design an Experiment • The purpose of an experiment is to find out how one variable (response variable) changes in response to change in another variable (explanatory variable). • Experiment: Subject Treatment Response
Placebo Effect • Placebo effect – change in behavior due to participation in experiment. • Placebo effect is a problem when experiment does not have a control group (a basis for comparison) • To avoid the problem – design a randomized comparative experiment.
How to design a Randomized Comparative Experiment • Randomly split the subjects into two groups: • control group – receives no treatment • treatment group – receives treatment • Compare the results. • Both will be equally affected by Placebo effect, so the difference between the groups shows whether the treatment works.
How to interpret results of an experiment • Observe outcomes for treatment and control groups. • If outcomes are different enough so that we can say that this difference would rarely occur by chance, we conclude that the difference is statistically significant.
Population vs. Sample • Population is the entire group of individuals about which we want information. • Sample is a part of population from which we actually collect information. • Based on the sample, we make conclusion about the whole population.
Parameter vs. Statistic • A Parameter is the number that describes the population. • A Statistic is a number that describes the sample. • We use statistics to estimate parameters.
Sampling Distribution • The result of your study is a statistic, which can vary from sample to sample • Sampling Distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population • Estimate=True Parameter + Sampling Error
Bias and variability • A statistic is biased if the mean of the sampling distribution is not equal to the true value of the parameter being estimated. • Variability of a statistic is the spread of sampling distribution. • Bias does not go away with larger samples. • Variability goes away with larger samples.