1 / 14

Chapter 2 Scatter plots, Correlation, Linear Regression, Inferences for Regression

Chapter 2 Scatter plots, Correlation, Linear Regression, Inferences for Regression. By: Tasha Carr, Lyndsay Gentile, Darya Rosikhina, Stacey Zarko. Scatter plots. Shows the relationship between two quantitative variables measured on the same individuals Look at:

finola
Download Presentation

Chapter 2 Scatter plots, Correlation, Linear Regression, Inferences for Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2Scatter plots, Correlation, Linear Regression, Inferences for Regression By: Tasha Carr, Lyndsay Gentile, Darya Rosikhina, Stacey Zarko

  2. Scatter plots • Shows the relationship between two quantitative variables measured on the same individuals • Look at: • Direction- positive, negative, none • Form-straight, linear, curved • Strength- little scatter means little association • great scatter means great association • Outliers- make sure there are no major outliers

  3. correlation • Measures the direction and strength of the linear relationship • Usually written as r • r is the correlation coefficient • Not resistant

  4. correlation • Rules: • It does not change if you switch x and y • Both variables must be quantative • Does not change when we change units of measurement • Positive r shows positive association, negative r shows negative association • Always between -1 and 1 • Values near 0 show weak linear relationship • Strength of relationship increases as r moves toward -1 and 1 (means points lie in straight line) • Not resistant, so outliers can change the value • Bad measure for curves

  5. Least-Squares Regression • Makes the sum of the squares of the vertical distances of the data points from the line as small as possible (not resistant) • Ŷ = b0 + b1 x • b1 x = slope • b1 = (sy / sx )(r) • Amount by which y changes when x increases by one unit • b0 = y-intercept • Value of y when x=0 • b0 = (y-bar) - b1 x • Extrapolation- making predictions outside of the given data ; inaccurate

  6. Least-Squares Regression • A Regression Line is a straight line that describes how a response variable as an explanatory variable x changes • Based on correlation • Used to predict the value of y for a given value of x • R2 = Coefficient of Determination • In the model, R2 of the variability in the y-variable is accounted for by variation in the x-variable.

  7. Residuals • Minimized by the LSRL • Difference between actual and predicted data • Observed – Expected • Actual – Guess • e = Y – Ŷ • Positive residuals – underestimates • Negative residuals – overestimates

  8. Residual Plot • A scatter plot of the regression residuals against the explanatory variable or predicted values • Shows if linear model is appropriate • If there is no apparent shape or pattern and residuals are randomly scattered, linear model is a good fit • If there is a curve or horn shape, or big change in scatter, linear model is not a good fit

  9. Lurking Variables • Variable that has an important effect on the relationship among the variables in a study but is not included among the variables studied • Make a correlation or regression misleading • An outlier- point that lies outside the overall pattern of the other observations • Influential point- removing it would change the outcome (outliers in the x- direction)

  10. Causation • An association between an explanatory and response variable does not show a causation, or cause and effect relationship, even if there is a high correlation • Correlation based on averages is higher than data from individuals

  11. Inference for Regression • Used to test if there is an association between two quantitative variables based on the population • To test for an association we check β1 • If no association exists this should be zero

  12. Inference for Regression • Hypothesis: • H0 : β1 = 0. There is no association • HA : β1 ≠ 0. There is an association. • Conditions: • Straight Enough: Check for no curves in scatter plot. • Independence: Data is assumed independent. • Equal Variance: Check residual plot for changes in spread • Nearly Normal: Create histogram or Normal Probability plot of the residuals. • All conditions have been met to use a student’s t-model for a test on the slope of a regression model.

  13. Inference for Regression • Mechanics • Df = n – 2 • t= (b1 – 0)/(SE(b1 ) • P-value = 2P(tn-2 > or < t) b0 b1 P-value • t= (b1 – 0)/(SE(b1 ) SE (b1 )

  14. Inference for Regression • Conclusion • If the p-value is less than alpha, reject the null hypothesis • If we reject H0, there is evidence of an association • If the p-value is greater than alpha, we fail to reject the null hypothesis • If we fail to reject the H0 , there is not enough evidence of an association

More Related