160 likes | 303 Views
Correlation & Linear Regression. MARE 250 Dr. Jason Turner. Means Tests Vs. Associations. Means tests – t-test, ANOVA – test for differences between/among means (Responses among/between factors) Associations – tests for relationships between/among variables (responses). Linear Regression.
E N D
Correlation & Linear Regression MARE 250 Dr. Jason Turner
Means Tests Vs. Associations Means tests – t-test, ANOVA – test for differences between/among means (Responses among/between factors) Associations – tests for relationships between/among variables (responses)
Linear Regression Linear regression investigates and models the linear relationship between a response (Y) and predictor(s) (X) Both the response and predictors are continuous variables (“Responses”) Linear regression analysis is used to: - determine how the response variable changes as a particular predictor variable changes - predict the value of the response variable for any value of the predictor variable
Regression vs. Correlation Linear regression investigates and models the linear relationship between a response (Y) and predictor(s) (X) Both the response and predictors are continuous variables (“Responses”) Correlation coefficient (Pearson) – measures the extent of a linear relationship between two continuous variables (“Responses”)
When Regression vs. Correlation? Linear regression - used to predict relationships, extrapolate data, quantify change in one versus other is weighted direction Correlation coefficient (Pearson) – used to determine whether there is a relationship or not IF Regression – then it matters which variable is the Response (Y) and which is the predictor (X) Y – (Dependent variable)X – (Independent) X causes change in Y (Y outcome dependent upon X) Y Does Not cause change in X (X –Independent)
Linear Regression Regression provides a line that "best" fits the data (from response & predictor) The least-squares criterion (method used to draw this "best line“) requires that the best-fitting regression line is the one with the smallest sum of the squared error terms (the distance of the points from the line).
Linear Regression The R2 and adjusted R2 values represent the proportion of variation in the response data explained by the predictors Adjusted R2 is a modified R2 that has been adjusted for the number of terms in the model. If you include unnecessary terms, R2 can be artificially high
y Is This Them? Are These They? y = b0 + b1x y = dependent variable b0 + b1= are constants b0= y intercept b1= slope x = independent variable Urchin density = b0 + b1(salinity)
Effects of Outliers Outliers may be influential observations A data point whose removal causes the regression equation (line) to change considerably Consider removal much like an outlier If no explanation – up to researcher
Warning on Regression Regression is based upon assumption that data points are scattered about a straight line What can we do to determine if a Regression is warranted?
Correlation Coefficient Correlation Coefficient (r)(Pearson) – measures the extent of a linear relationship between two continuous variables (responses) Pearson correlation of cexa Ant and cexa post = 0.811 P-Value = 0.000 IF p < 0.05 THEN the linear correlation between the two variables is significantly different than 0 IF p > 0.05 THEN you cannot assume a linear relationship between the two variables
“R2 D2 it is you, it is you” Coefficient of Determination (R2) - Expression of the proportion of the total variability in the response (s) attributable to the dependence of all of the factors R2 – used for assessing the “goodness of fit” of a regression model Should use Adjusted R2 as it is a more conservative measure R2 values range from 0 to 100%. An R2 of 100% means that all of the variability in the data can be explained by the model
Coefficient Relationships The coefficient of determination (r2) is the square of the linear correlation coefficient (r)
Next Week Regression Analysis: _ Urchins versus % Rock The regression equation is _ Urchins = - 0.557 + 0.0361 % Rock Predictor Coef SE Coef T P Constant -0.5569 0.3820 -1.46 0.146 % Rock 0.036116 0.0062 5.80 0.000 S = 3.27363 R-Sq = 11.0% R-Sq(adj) = 10.6%