1 / 26

Correlation and Regression

Correlation and Regression. Basic Concepts. An Example. We can hypothesize that the value of a house increases as its size increases. Said differently, size and house value “covary” or “co-relate.”

megan
Download Presentation

Correlation and Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation and Regression Basic Concepts

  2. An Example • We can hypothesize that the value of a house increases as its size increases. • Said differently, size and house value “covary” or “co-relate.” • Further, we can hypothesize that the relationship is a simple linear one, e.g., that as size increases, house value increases in a similar linear fashion. • Hence we can use the simple linear equation, • y = a + bx, to describe the relationship

  3. We Ask Two Questions… • Is there a relationship and how strong is it? • and • What is the relationship? • We answer the first with a new statistic, a “correlation” coefficient. • We answer the second with a linear regression model.

  4. Two Questions • We started with Correlation Monday. • We continue today with Regression.

  5. Terms • Independent and Dependent variables • Scatterplots • Correlation, correlation coefficient, r • Regression, regression coefficient, b • Regression, regression constant, a • Ordinary Least Squares (OLS) equation: y = a + bx + e

  6. Issues • Defining relationships • Nature of the relationship: for the moment, linear • Strength of the relationship (using r) • Direction of the relationship (using r and b) • Calculation of the relationship: y = a + bx + e

  7. Some useful websites • http://davidmlane.com/hyperstat/A60659.html • http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html

  8. Illustration • Case A. x= 2.5, y=2 • Case B. x=8, y = 7

  9. Linear Trend

  10. What if there are lots of data points?

  11. If there are more data points?How do we summarize the relationships in the data? ?

  12. Solution: Least Squares Regression, The Best Linear Fit

  13. Some Theory • Knowing nothing else, the best estimate of a variable is its mean.

  14. The Regression Model does better… • Deviation from y = yi – ymean

  15. A Regression equation… • Measures the nature of the relationship between x and y using a linear model • Measures the direction of the relationship • Accompanying statistics, for the time being, r, measures the strength of the relationship.

  16. Understanding the Improvement, measuring the deviations from the mean

  17. More Terms • Yi – the value of a particular case • Y mean – mean value of y • Y hat – y with a ^ above it soŷ • (Yi – Ymean) = total deviation from mean Y • (Yhat – Ymean) = explained deviation of Yi from Y mean • (Yi – Yhat) = unexplained deviation of Yi from Y mean

  18. Bivariate Regression • Relationships are modeled using the equation, y = a + bx + e • Translation: The values of an interval level dependent variable, y, can be “predicted” or “modeled” by adding a constant, a, to the product of a slope coefficient, b, times the values of the independent variable, x, and an error term, e.

  19. Estimating the Equation, y = a + bx + e • The regression equation is calculated by finding the equation that minimizes the sum of the squared deviations between the data points, the y’s, and the predicted y’s, also called y hat.

  20. Correlation Coefficient: r • A measure of the strength of a linear relationship between two interval variables, x and y • Ranges from – 1 to + 1 • The higher the value of r (e.g., the closer to -1 or + 1, the stronger the relationship between x and y

  21. Correlation Coefficient calculation • r = Covariance of x and y divided by the product of the standard deviation of x and the standard deviation of y • Covariance is the sum of the products of the deviations of the cases divided by N.

  22. Equations...

  23. Calculating a and b

More Related