810 likes | 920 Views
You are about to take the AP Stats test and…. MAKE SURE YOU HAVE…. MULTIPLE PENCILS… 1 OR 2 CALCULATORS… EXTRA BATTERIES… EAT A FULL MEAL BEFOREHAND!. Statistics is about…. Models A model is an attempt to represent reality… …but we know it’s not perfect. Models.
E N D
MAKE SURE YOU HAVE… • MULTIPLE PENCILS… • 1 OR 2 CALCULATORS… • EXTRA BATTERIES… • EAT A FULL MEAL BEFOREHAND!
Statistics is about… Models • A model is an attempt to represent reality… • …but we know it’s not perfect.
Models “All models are wrong, but some are useful.” (George Box)
Models Common models: • Regression lines • Simulations AND MORE…
THINK about Models Avoid confusion … • known vs unknowable • samples vs populations • statistics vs parameters
THINK about Models Probability Models: • Normal Model • Geometric & Binomial • t - Models • Chi-Square Models
The calculator… … can’t TELL what it all means. YOU have to do that, too!
Answers are not numbers, answers are sentences in context. Remember …
The 68, 95, 99.7 Rule • Only works for Normal Distributions.
When describing Univariate Data (1 variable) • Shape! • Center! • Spread!
Range is… • A way to measure the “spread” of data • It is a single number, such as 40. (not 30-70) • Variance, standard deviation, IQR are other ways to measure the “spread” of data.
Mean or Median is… • A way to measure the location of data (center) • They are a single number
Adding a constant to every value in a data set… • Changes the central location of the data (such as mean or median) • Does NOT change the spread of the data (such as standard deviation, variance, IQR)
Multiplying by a constant to every value in a data set… • Changes the central location of the data (such as the mean or the median) • DOES CHANGE the spread of the data (such as standard deviation, variance, IQR)
When describing bivariate quantitative data… • Form! • Strength! • Direction! • This is when describing a scatterplot, linear regression, or the like...
A residual is… • The vertical distance from point to LSRL • It is calculated as • Observed Y – Predicted Y • All points ABOVE the LSRL have positive residuals! • All points BELOW the LSRL have negative residuals!
An Exponential Model is best fit when… • Log Y vs X is linear • Variable is in the exponent
A Power Model is best fit when… • Log Y vs. Log X is linear • Variable is in the base
Interpreting Slope… • “For every [unit] increase in [x], we expect an [? unit] increase in [y]
Interpreting Y Intercept • When [x] is 0 [units], we expect [y] to be [? units]
You will see the word “COMPARE” at least once… • So COMPARE!!! • Don’t list attributes… use comparative words!
x y z Lurking variable (Common Response)
x y z Confounding Variable
“Correlation does not imply causation!” • Be careful with the word “cause”. The only way to prove causation is a properly designed experiment.
R is called the… • “Correlation Coefficient” • Or just…. “Correlation” • It measure the strength and direction of a linear relationship (no context for nonlinear relationships)
R-squared is called… • “Coefficient of Determination” • And is interpreted as “the % of the variation in [y] that is explained by [x]” • It can also be thought of as • sum of explained error/sum of total error
If r squared is .64 • Then r = .8 OR r= -.8 • Figure it out by looking at the direction of the scatterplot!
Simpson’s Paradox is… • When combining the data from 2 groups results in a reversal of direction of the conclusion.
Placebo Effect • Giving a person a sugar pill and telling them it will make them feel better
Double Blind • Neither the subjects nor the experimenter know which treatment the subject is receiving
Disjoint • Both cant occur simultaneously
Mutually Exclusive • One or the other must occur
Checks for Independence • P(A and B)=p(A)p(B) • Or • P(B)=p(B|A)
Rules for Means and Variances of Discrete Random Variables • P420
Central Limit Theorem • As sample size increases, • the shape of the sampling distribution gets more and more normal, regardless of the shape of the parent distribution. • The center (mean) of the sampling distributions stays exactly the same. • The variability in the sampling distribution (standard deviation) decreases.
Law of Large Numbers • As sample size increases, the mean, Xbar, tends to get closer and closer to u.
DON’T FORGET TO… Use the proper NOTATION. P(x>2)=….
Notation is communication • a, b, n, p, q, r, s, t, x, y, z, E, H, P, π, , all have special meanings… • …and “hats” or “bars” change those meanings. • You are not free to substitute another letter even though it looks like algebra.
4 Requirements of a Binomial Setting are… • Independence • Success/Failure for each trial • Equal probability of success for each trial • Fixed number of trials
The only difference between binomial setting and geometric setting is… • Geometric does not have a fixed number of trials, it is waiting for the first success…
The mean of a geometric distribution is… • 1/p • (this is not on your formula sheet, but you should know it) • To calculate a geometric probability, use a tree diagram
Undercoverage Bias • When some groups are systematically left out of the sampling process (like people without phones in a phone survey)
Voluntary Response Bias • When a sample consists of volunteers (like calling in to a radio survey)
Nonresponse Bias • When an individual cant be contacted or refuses to cooperate
P of your “Phantoms” is “Define the parameter”. So Define your parameter (either p or u) as specifically as possible.
The grader of your test… • Doesn’t know what “PANIC” and “PHANTOMS” are… they are simply for your own organization.
Your hypotheses must… • Be about the PARAMETERS. • Why make a hypothesis about the sample? • (X bar and p hat shouldn’t ever be in the Hypotheses).
When you do inference… • You hypothesize about what the value of a single number is (that number is usually u, or p).
What IS an Assumption? • an underlying hypothesis about the situation required by the mathematical justification for the statistical method. WE WILL PROBABLY NEVER KNOW IF AN ASSUMPTION IS TRUE.