1 / 24

Regression and Correlation

Regression and Correlation. CSLU 2850.Lo1 Spring 2008 Cameron McInally mcinally@fordham.edu Fordham University. May contain work from the Creative Commons. Regression and Correlation. Regression and Correlation In this lecture, we will cover:

kimhong
Download Presentation

Regression and Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression and Correlation CSLU 2850.Lo1 Spring 2008 Cameron McInally mcinally@fordham.edu Fordham University May contain work from the Creative Commons.

  2. Regression and Correlation • Regression and Correlation • In this lecture, we will cover: • Linear Regression: Trying to find one line that best fits data points on a Cartesian Plane. • Correlation: Measuring the strength of a relationship between two variables.

  3. Regression and Correlation • Linear Regression • When we plot two variables against each other (i.e. an XY graph), the values usually do not fall in a straight line. • Linear Regression is attempting to find a straight line that best estimates the relationship between the two variables.

  4. Regression and Correlation • Linear Regression • There are two important parts: • Fitted Regression Line: The line that best fits the data points on the graph. • Regression Equation: The equation that generates the Fitted Regression Line.

  5. Regression and Correlation • Fitted Regression Line • If the data falls approximately into a straight line, then we can fit a line to the points. • Some points may be above the line and others may be below the line. • In order to find the best-fit line, we use a Regression Equation.

  6. Regression and Correlation • Regression Equation • Slope-Intercept Form: • In grade school we learn Slope-Intercept Form for plotting a line. Here, m is the slope of the line and b is where the line crosses the y-axis.

  7. Regression and Correlation • Regression Equation • In Statistics and Excel, the regression equation, for the fitted regression line, will be of the form: Both say exactly the same thing!!!

  8. Regression and Correlation • Regression Equation • Terms for the equation: • y: is called the dependant variable; • x: is called the independent(or predictor)variable; • a: is a coefficient called the intercept or constant term; • b: is a coefficient called the slope;

  9. Regression and Correlation • Residuals • The black vertical lines you see represent the error between a particular data point and the line we have chosen. • We call these lines the residuals. These are the differences between the observed dependant values and the predicted values.

  10. Regression and Correlation • Fitting the Regression Line • Assuming our data fits a linear model: • Where, • α is the true intercept; • β is the true slope; • Є is the error term;

  11. Regression and Correlation • Least Squares Method • When fitting the line, we do not know the true intercept and slope. So, we must estimate them. • To get estimates for α and β, we find values for a and b that minimize the value of the sum of squared residuals:

  12. Regression and Correlation • Least Squares Estimates • To find the estimates for a and b:

  13. Regression and Correlation • Coefficient of determination • Also called the R2-value. • Measures the percentage of variation in the values of the dependent variable, that can be explained by the change in the independent variable. • If R2=0.61803, then 61.803% of the variation in the dependent variable can be explained by the change in the independent variable.

  14. Regression and Correlation • Checking the Regression Model • When we perform a regression on a set of data, we are making 4 important assumptions: • The straight-line model is correct; • The error term, ε, is normally distributed with mean 0; • The errors have constant variance; • The errors are independent of each other;

  15. Regression and Correlation • Straight-line assumption • Plot the observed data points, with the predicted data points, on the same scatter-plot. • Do the observed data points appear to follow the straight line? Do the data points appear curved?

  16. Regression and Correlation • Normal Distribution of the Residuals • To test this assumption, create a normal plot of the residuals. • If the residuals follow a normal distribution, they should fall evenly along the normal probability plot.

  17. Regression and Correlation • Constant variance assumption • Next, we will look at the assumption of constant variance in the residuals. To view this graphically, plot the residual values against the predicted values. • If there is an increase in variance, at any point along the line, then there is reason to doubt the assumption of constant variance. • Outliers may have a significant impact on this test.

  18. Regression and Correlation • Independent errors • This final assumption should only be considered when there is a defined order of the observations. • We would like to make sure that the residual of a data point is not influenced by the surrounding data points.

  19. Regression and Correlation • Correlation and Causation • Correlation: indicates the relationship between two variables without assuming that a change in one causes a change in the other. • Causation: indicates the relationship between two variables where a change in one causes a change in the other.

  20. Regression and Correlation • Correlation • Expresses the strength of a relationship between -1 and 1. • Positive Correlation: indicates that an increase in one variable implies an increase in the other variable. • Negative Correlation: indicates that an increase in one variable implies a decrease in the other. • Zero Correlation: indicates that there is no correlation between variables.

  21. Regression and Correlation • Pearson Correlation Coefficient • Most often used measure of correlation:

  22. Regression and Correlation • Pearson Correlation Coefficient • Notice that the sign of this coefficient and the slope of the line are the same. • The slope can be any real number, but the coefficient must be between -1 and 1. • Perfect Fit: A value of 1 means that all the data points fall perfectly on the line, i.e. all the residuals are 0. • Does not work well with Outliers. • Does not work well with a curved relationship.

  23. Regression and Correlation • Spearman Correlation Coefficient • Does work well with Outliers. • Does work well with a curved (non-linear) relationship. • We will work with this coefficient in the lab.

  24. Regression and Correlation Homework(Always Due in One Week) • Read Chapter 8. • Complete Chapter 8, pg. 326: 1, 2, 4

More Related