270 likes | 291 Views
STAT 101 Dr. Kari Lock Morgan 10/9/12. Exam 1 Review. Review of Chapters 1-4. Office Hours This Week. Tuesday Tracy 5 – 7 pm, Old Chem 211A Wednesday Kari 11:30 – 12:30pm, Old Chem 216 Tracy 4:30 – 5:30 pm, Old Chem 211A Heather 8 – 9pm, Old Chem 211A Thursday
E N D
STAT 101 Dr. Kari Lock Morgan 10/9/12 Exam 1 Review • Review of Chapters 1-4
Office Hours This Week • Tuesday • Tracy 5 – 7 pm, Old Chem 211A • Wednesday • Kari 11:30 – 12:30pm, Old Chem 216 • Tracy 4:30 – 5:30 pm, Old Chem 211A • Heather 8 – 9pm, Old Chem 211A • Thursday • Kari 1 – 2:30 pm, Old Chem 216
Sample • The Big Picture Population Sampling Statistical Inference Descriptive statistics
Cases and Variables We obtain information about cases or units. A variable is any characteristic that is recorded for each case. • Generally each case makes up a row in a dataset, and each variable makes up a column
Sample Sampling Population Sample GOAL: Select a sample that is similar to the population, only smaller
Observational Studies • There are almost always confounding variables in observational studies • Observational studies can almost never be used to establish causation Observational studies can almost never be used to establish causation Observational studies can almost never be used to establish causation
Randomized Experiments • Because the explanatory variable is randomly assigned, it is not associated with any other variables. Confounding variables are eliminated!!! Confounding Variable RANDOMIZED EXPERIMENT Explanatory Variable Response Variable
Data Collection Was the explanatory variable randomly assigned? Was the sample randomly selected? Yes No Yes No Possible to generalize to the population Should not generalize to the population Can not make conclusions about causality Possible to make conclusions about causality
Statistic vs Parameter • A sample statisticis a number computed from sample data. • A population parameter is a number that describes some aspect of a population
Sampling Distribution • A sampling distributionis the distribution of statistics computed for different samples of the same size taken from the same population • The spread of the sampling distribution helps us to assess the uncertainty in the sample statistic • In real life, we rarely get to see the sampling distribution – we usually only have one sample
Bootstrap • A bootstrap sample is a random sample taken with replacement from the original sample, of the same size as the original sample • A bootstrap statisticis the statistic computed on the bootstrap sample • A bootstrap distributionis the distribution of many bootstrap statistics
BootstrapSample Bootstrap Statistic BootstrapSample Bootstrap Statistic Original Sample Bootstrap Distribution . . . . . . Sample Statistic BootstrapSample Bootstrap Statistic
Confidence Interval • A confidence intervalfor a parameter is an interval computed from sample data by a method that will capture the parameter for a specified proportion of all samples • A 95% confidence interval will contain the true parameter for 95% of all samples
Standard Error • The standard error (SE) is the standard deviation of the sample statistic • The SE can be estimated by the standard deviation of the bootstrap distribution • For symmetric, bell-shaped distributions, a 95% confidence interval is
Percentile Method • If the bootstrap distribution is approximately symmetric, a P% confidence interval can be gotten by taking the middle P% of a bootstrap distribution
Hypothesis Testing • How unusual would it be to get results as extreme (or more extreme) than those observed, if the null hypothesis is true? • If it would be very unusual, then the null hypothesis is probably not true! • If it would not be very unusual, then there is not evidence against the null hypothesis
p-value • The p-value is the probability of getting a statistic as extreme (or more extreme) as that observed, just by random chance, if the null hypothesis is true • The p-value measures evidence against the null hypothesis
Randomization Distribution • A randomization distribution is the distribution of sample statistics we would observe, just by random chance, if the null hypothesis were true • The p-value is calculated by finding the proportion of statistics in the randomization distribution that fall beyond the observed statistic
Statistical Conclusions Strength of evidence against H0: Formal decision of hypothesis test, based on = 0.05 :
Formal Decisions For a given significance level, , p-value < Reject Ho p-value > Do not Reject Ho
Formal Decisions “If the p-value is low, the ho must go”
Errors Decision TYPE I ERROR Truth TYPE II ERROR