600 likes | 825 Views
LINEAR REGRESSION. Correlation & Linear Regression. Not the same, but are related Linear regression: line that best predicts Y from X Use when one of the variables is controlled Correlation : quantifies how X and Y vary together Use when both X and Y are measured.
E N D
Correlation & Linear Regression • Not the same, but are related • Linear regression: • line that best predicts Y from X • Use when one of the variables is controlled • Correlation: • quantifies how X and Y vary together • Use when both X and Y are measured
Correlation & Linear Regression • 3 characteristics of a relationship: • Direction • Positive (+) • Negative (-) • Degree of Association • Between -1 and +1 • Absolute values signify strength • Form • Linear • Non-linear
Linear Regression • If two variables are linearly related, it is possible to develop a simple equation to predict one from the other • The outcome (dependent) variable is designated Y • The predictor (independent) variable is designated X
The Linear Equation • General form: Y = a + bX • Where: • a = intercept • b = slope • X = predictor • Y = outcome • Can use this equation to predict Y for any given value of X • a and b are constants in a given line; X and Y change
The Linear Equation • Same a, different b’s...
The Linear Equation • Same b, different a’s...
The Linear Equation • Different a’s and b’s...
Slope and Intercept • When there is no linear association (r = 0), the regression line is horizontal (b = 0)
Slope and Intercept • When the correlation is perfect (r = ±1), all the points fall exactly along a straight line
Slope and Intercept • When there is some linear association (0<r<±1), the regression line fits as close to the points as possible
Where did this line come from? • It is a straight line which is drawn through a scatterplot, to summarise the relationship between X and Y • It is the line that minimises the squared deviations (Y’ - Y)2 • We call these squared deviations “residuals”
Regression Lines • Minimising the squared vertical distances, or “residuals”
Regression: Analyzing the “fit” • How well does the regression line describe the data? • Assessing “fit” relies on analysis of residuals • Conduct an ANOVA to test the null hypothesis that an increase in X does not cause a change (positive or negative) in the value of Y
Y Regression ANOVA • Need to partition out the variability • Total variability of Y = variability explained by the regression line + unexplained variability
Regression ANOVA: Example • Fill in the ANOVA table:
Regression ANOVA: Example • F1,4 = 7.71 (from table) • Reject HO; An increase in X does cause a change in Y
SPSS Linear Regression
Linear Regression • Linear Regression uses one or more independent variables in an equation to best predict the value of the dependent variable • From the menus choose: Analyze Regression Linear
Linear Regression • Select one dependent variable (numeric) and one or more independent variables (numeric) • The output will compute an ANOVA telling you whether the overall regression is significant • It will also calculate a value for the slope and intercept (coefficients)
Example • Perform a regression analysis with “Years Since PhD” as the independent variable and “Publications” as the dependent variable • What is the equation of the straight line? • Plot the data and draw a regression line through the scatterplot
Results Intercept Slope
Example Scatter plot with regression line
Multiple Regression – Example As cheese ages, various chemical processes take place that determine the taste of the final product. The dataset “Cheese” contains concentrations of various chemicals in 30 samples of mature cheddar cheese, and a subjective measure of taste for each sample. Use a multiple regression analysis to evaluate the effect of these three chemicals on the taste of cheese.
Correlation Statistical technique that measures and describes the degree of linear relationship between two variables
Pearson’s r • Value ranging from -1 to +1 • Indicates strength and direction of the linear relationship • Absolute value indicates strength • +/- indicates direction
Pearson’s r • r is an estimate of the population (rho)
Example = 0.866 What is the correlation value?
Some issues with r • Outliers have strong effects • Restriction of range can suppress or augment r • Correlation is not causation • No linear correlation does not mean no association
Outliers • Outliers can strongly affect the value of r
Restricted Range • The relationship you see between X and Y may depend on the range of X • E.g., the size of a child’s vocabulary has a strong positive association with the child’s age, but if all the children in your data set are in the same grade at school, you may not see much association
Common Causes • Two variables might be associated because they share a common cause • There is a positive correlation between ice cream sales and drownings – This is because they both occur more often in summer, not because one is causing the other
Non-Linearity • Some variables are not linearly related, though a relationship obviously exists
Non-Linearity • Even though we find a significant correlation, the relationship may not be linear Four sets of data with the same correlation of 0.816
r-squared • r2 is the Coefficient of Determination • It is the amount of covariation compared to the amount of total variation • The percent of total variance that is shared variance • E.g. If r = 0.80, then r2 = (0.80)2 = 0.64 • X explains 64% of the variation in Y (and vice versa)
Hypothesis Testing with r Use t-test to test if there is a significant correlation (relationship) between variables
- 2 1 r = SE r - n 2 r = t = - df n 2 SE r Previous Example Is the correlation significant? • r= 0.866 • SEr = 0.25 • t = 3.464 • d.f. = 4 • tcrit = 2.776 (from table) Reject null hypothesis – There is a significant relationship