230 likes | 383 Views
SESSION 49 - 52. Last Update 17 th June 2011. Regression. Learning Objectives. XY-Scatter Diagrams Plotting the Regression Line Coefficient Estimates Pearson Coefficient of Correlation Spearman Rank Correlation Coefficient. XY-Scatter Diagram.
E N D
SESSION 49 - 52 Last Update 17th June 2011 Regression
Learning Objectives • XY-Scatter Diagrams • Plotting the Regression Line • Coefficient Estimates • Pearson Coefficient of Correlation • Spearman Rank Correlation Coefficient
XY-Scatter Diagram To draw a scatter diagram we need data for two variables. In applications where one variable depends to some degree on the other variable, the dependent variable is labeled Y and the other, called the independent variable, X. The values for X and Y are combined into a single data point using the observations for X and Y as coordinates.
Regression Analysis Regression analysis is used to predict the value of one variable on the basis of the other variables. The first-order linear model describes the relationship between the dependent variable Y and the independent variable(s) X. The regression model with a as the y-intercept and m as the slope coefficient is of the form:
Example Temperature - Truck The estimators of the intercept a and slope coefficient b are based on drawing a straight line through the sample data:
Intercept and Slope The intercept a is the y-coordinate of the point where the linear function intersects the y-axis. The slope coefficient b is defined as the change in y for a unit change in x.
Fitted Line With Residuals The line drawn through the point is called the regression line.
Residuals Squared The regression or least square line represents a line that minimizes the sum of the squared differences between the points and the line.
Calculating Coefficients Raw Data (y-variable as dependent and x as independent variable):
Solution Step1: Calculate the gradient (beta):
Solution Step 2: Calculate the intercept (alpha):
Interpreting the Coefficients The slope coefficient b may be interpreted as the change in the dependent variable y for a one unit change in x. In the previous example, a one unit change in temperature results in a b = 0.654 additional truckloads of cool drinks sold. The intercept a is the point at which the regression line and the y-axis intersect. If x = 0 lies far outside the range of sample values x, the interpretation of the intercept is not straight-forward. In the temperature-truck example, x = 0 lies outside the smallest and largest values for x in the sample. Interpreting the intercept for x would imply that at temperature of x = 0, the soft-drink sales decline to negative 3.914!
Point Prediction Upon obtaining the coefficient estimates we can predict the outcome for various x (point prediction) between the minimum and maximum sample observation using the regression function y = a + mx. For example:
Pearson Coefficient of Correlation The Pearson coefficient of correlation R may be used to test for linear association between variables. The coefficient is useful to determine whether or not a linear relationship exists between y and x. Note that variables may be positively or negatively correlated. R = 1 denotes perfect positive correlation, R = -1 signifies perfect negative correlation. R is defined for:
Type of Relationship DIRECT LINEAR RELATIONSHIP INVERSE LINEAR RELATIONSHIP NO LINEAR RELATIONSHIP Small Dispersion Wide Dispersion Small Dispersion Wide Dispersion No Correlation r = 0 Negative Linear Correlation exists -1 < r < 0 Positive Linear Correlation exists 0 < r <+ 1
Coefficient of Determination Squaring the Pearson coefficient of correlation delivers the coefficient of determination R2in regression. It may be interpreted as the proportion of variation in the dependent variable y that is explained by the variation in the explanatory variable x. R2 is a measure of strength of the linear relationship between y and x.
Solution Step 3: Calculate R and R2
Spearman Rank Correlation The standard coefficient of correlation allows for determining whether there is evidence of a linear relationship between two interval variables. In case where the variables are ordinal, or, if both variables are interval, the normality requirement may not be satisfied. A nonparametric test statistic called Spearman Rank Correlation Coefficient may be used under the circumstances.
Objective: Comparing 2 Variables Analyzing the relationship between two variables Data type? Nominal Ordinal Nominal Population Distribution? Error is normal or x and ybivariate normal x and y not bivariate normal Simple linear regression Spearman Rank Correlation Chi-Square test of a contingency table
Example Below there is a list of organizational strengths that were independently ranked by management and staff and the managing director wished to know how closely correlated were the assessments: