1 / 21

Midterm review

Statistics 111 - Lecture 10. Midterm review. Chapters 1-5. Administrative Notes. Homework 3 is due Monday Covers material from Chapter 5, so worth doing as practice for the midterm! Exam on Monday Starts exactly at 10:40 – get here early. Some Topics Not Covered on Midterm.

archiedixon
Download Presentation

Midterm review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics 111 - Lecture 10 Midterm review Chapters 1-5 Stat 111 - Lecture 10 - Review

  2. Administrative Notes • Homework 3 is due Monday • Covers material from Chapter 5, so worth doing as practice for the midterm! • Exam on Monday • Starts exactly at 10:40 – get here early Stat 111 - Lecture 10 - Review

  3. Some Topics Not Covered on Midterm • Continuity correction for binomial calculations (chapter 5) • Normal quintile plots(chapter 1) Stat 111 - Lecture 10 - Review

  4. Experiments • Used to examine effect of a treatment eg. medical trials, education interventions • Different from an observational study, where no treatment is imposed • Observational studies can only examine associations between variables, whereas experiments try to establish causal effects • Experiments can still be biased though! Treatment Group Treatment Result 1 Experimental Units 2 3 4 Population Control Group No Treatment Result Stat 111 - Lecture 10 - Review

  5. Sampling and Surveys ? Population Parameter • Just like in experiments, we must be cautious of potential sources of bias in our sampling results • Voluntary response samples, undercoverage, non-response, untrue-response, wording of questions • Simple Random Sampling: less biased since each individual in the population has an equal chance of being included in the sample Sampling Inference Estimation Sample Statistic Stat 111 - Lecture 10 - Review

  6. Distributions • A distribution describes what values a variable takes and how frequently these values occur • Boxplots are good for center and spread, but don’t indicate shape of a distribution • Histograms much more effective at displaying the shape of a distribution Stat 111 - Lecture 10 - Review

  7. Numerical Measures of Center • Mean: • Median: “middle number in distribution” • Mean is more affected by large outliers and asymmetry than the median • Symmetric: Mean ≈ Median • Skewed Left: Mean<Median • Skewed Right: Mean>Median Stat 111 - Lecture 10 - Review

  8. Numerical Measures of Spread • Variance: average of the squared deviations of each observation • Standard Deviation = • Inter-Quartile Range: IQR = Q3 - Q1 • First Quartile (Q1) is the median of the smaller half of data • Third Quartile (Q3) is the median of the larger half of data • With outliers or asymmetry, median and IQR are better but we will use mean and SD more since most distributions we use (eg. normal distribution) are symmetric with no outliers Stat 111 - Lecture 10 - Review

  9. Scatterplots of two variables • Positiveassociation vs Negative association • Some associations are not just positive or negative, but also appear to be linear • Correlation is a measure of the strength of linear relationship between variables X and Y • r near 1 or -1 means strong linear relationship • r near 0 means weak linear relationship • Negative r means negative association Stat 111 - Lecture 10 - Review

  10. Linear Regression • Best fit line between X and Y: Y = a + b·X • The slope b( ): average change you get in the Y variable if you increased the X variable by one • The intercept a ( ):average value of the Y variable when the X variable is equal to zero • Regression equation used to predict response variable Y for a value of our explanatory variable X Stat 111 - Lecture 10 - Review

  11. Probability • Random process: outcome not known exactly, but have probability distribution of possible outcomes • Event: outcome of random process with prob. P(A) • Additive Rule for Disjoint Events: P(A or B) = P(A) + P(B) if A and B are disjoint • Multiplication Rule for Independent Events: P(A and B) = P(A) x P(B) if A and B are independent • Need to combine different rules (Eg. Lecture 8) Stat 111 - Lecture 10 - Review

  12. Probability and Random Variables • Conditional Probability: • Random variable: numerical outcome or summary of a random process • A discrete random variable has a finite number of distinct values • Continuous random variables can have a non-countable number of values Stat 111 - Lecture 10 - Review

  13. Discrete vs. Continuous RV’s • Probability histogram for distribution of discrete r.v. • Calculate probabilities by adding up bars of histogram • Density curve used for distribution of continuous r.v. • Calculate probabilities by integrating area under curve Stat 111 - Lecture 10 - Review

  14. Linear Transformations of Variables • Same rules for both data and random variables: mean(a·X + c) = a·mean(X) + c variance(a·X + c) = a2 ·variance(X) SD(a·X + c) = |a|· SD(X) • Adding constants does not change spread measures • Can also do combinations of more than one variable: If X and Y are variables and Z = a·X + b·Y + c mean(Z) = a·mean(X) + b·mean(Y) + c If X and Y are also independent then Variance(Z) = a2·Variance(X) + b2·Variance(Y) Stat 111 - Lecture 10 - Review

  15. The Normal Distribution N(0,1) N(2,1) • The Normal distribution has the shape of a “bell curve” with parameters  and 2,denoted N(,2) • StandardNormal:  = 0 and 2 = 1 • Normal distribution follows the 68-95-99.7 rule: • 68% of observations are between  -  and  +  • 95% of observations are between  - 2 and  + 2 • 99.7% of observations are between  - 3 and  + 3 • Have tables for any probability from the standard normal distribution N(-1,2) N(0,2) Stat 111 - Lecture 10 - Review

  16. Standardization • For non-standard normal probabilities, need to transform to a standard normal distribution • If X has a N(,2) distribution, then we can convert to Z which follows a N(0,1) distribution: • Can then calculate P(Z < k) using table • Reverse standardization: converting a standard normal Z into a non-standard normal X X = σZ + μ • Practice makes perfect! Stat 111 - Lecture 10 - Review

  17. Inference for Continuous Data • Continuous data is summarized by sample mean • Sample mean is used as our estimate of the population mean, but how does sample mean vary between samples? Sample 1 of size n x Sample 2 of size n x Sample 3 of size n x Sample 4 of size n x Sample 5 of size n x Sample 6 of size n x . . . Distribution of these values? Population Parameters:  and 2 Stat 111 - Lecture 10 - Review

  18. Sampling Distribution of Sample Mean • The center of the sampling distribution of the sample mean is the population mean: • Over all samples, the sample mean will, on average, be equal to the population mean (no guarantees for 1 sample!) • The spread of the sampling distribution of the sample mean is • As sample size increases, variance of the sample mean decreases! • Central Limit Theorem: if the sample size is large enough, then the sample mean X has an approximately Normal distribution Stat 111 - Lecture 10 - Review

  19. Inference for Count Data • Goal for count data is to estimate the population proportion p • From a sample of size n, we can calculate two statistics: 1. sample count Y 2. sample proportion • Use sample proportion as our estimate of population proportion p • Sampling Distribution of the Sample Proportion • how does sample proportion change over different samples? Sample 1 of size n Sample 2 of size n Sample 3 of size n Sample 4 of size n Sample 5 of size n Sample 6 of size n . . . Distribution of these values? Population Parameter: p Stat 111 - Lecture 10 - Review

  20. Sampling Distribution for Proportion • For small samples, use the Binomial distribution to calculate probabilities for the sample count or sample proportion • Definition of “small”: n·p < 10 or n·(1-p) < 10 • For large samples, we use the Normal approximation to the Binomial distribution for the sample count or sample proportion Stat 111 - Lecture 10 - Review

  21. Next Week - Lecture 11 • Chapter 6 • Good luck on midterm next Monday! Stat 111 - Lecture 10 - Review

More Related