460 likes | 472 Views
Learn how to build a regression model with Dr. C. Ertuna, covering topics like dependent and independent variables, correlation, regression coefficients, residual analysis, and more.
E N D
Linear Regression (Lesson - 06/A) Building a Model for the Relationship Dr. C. Ertuna
Dependent and Independent Variables A dependent variable is the variable to be predicted or explained in a regression model. This variable is assumed to be functionally related to the independent variable. Dr. C. Ertuna
Dependent and Independent Variables An independent variable is the variable related to the dependent variable in a regression equation. The independent variable is used in a regression model to estimate the value of the dependent variable. Dr. C. Ertuna
Two Variable Relationships Y X (a) Linear Dr. C. Ertuna
Two Variable Relationships Y X (b) Linear Dr. C. Ertuna
Two Variable Relationships Y X (c) Curvilinear Dr. C. Ertuna
Two Variable Relationships Y X (d) Curvilinear Dr. C. Ertuna
Two Variable Relationships Y X (e) No Relationship Dr. C. Ertuna
Correlation The correlation coefficient is a quantitative measure of the strength of the linear relationship between two variables. The correlation ranges from + 1.0 to - 1.0. A correlation of 1.0 indicates a perfect linear relationship, whereas a correlation of 0 indicates no linear relationship. Dr. C. Ertuna
Correlation SAMPLE CORRELATION COEFFICIENT where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable Dr. C. Ertuna
Correlation TEST STATISTIC FOR CORRELATION where: t = Number of standard deviations that r is away from 0 r = Simple correlation coefficient n = Sample size Dr. C. Ertuna
A C B Correlation Spurious correlation occurs when there is a correlation between two otherwise unrelated variables. Dr. C. Ertuna
Linear Regression Analysis Simple Linear Regressionanalyzes the linear relationship that exists between a dependent variable and a single independent variable. Multiple Linear Regression analyzes the linear relationship that exists between a dependent variable and two or more independent variables. Dr. C. Ertuna
Linear Regression Analysis SIMPLE LINEAR REGRESSION MODEL (POPULATION MODEL) where: y = Value of the dependent variable x = Value of the independent variable = Population’s y-intercept = Slope of the population regression line = Error term, or residual Dr. C. Ertuna
Linear Regression Analysis The linear regression model has four assumptions: • The mean of the dependent variable (y), for all specified values of the independent variable, can be connected by a straight line (linear) called the population regression model. • The error terms, i, are statistically independent of one another (for time-series data). • The distribution of error terms, , is normal. • The distributions of possible i values have equal variances for all value of x. Dr. C. Ertuna
Linear Regression Analysis REGRESSION COEFFICIENTS In the simple regression model, there are two coefficients: the intercept and the slope. In the multiple regression model, there are more than two coefficients: the intercept and regression coefficient for each independent variable. Dr. C. Ertuna
Linear Regression Analysis The interpretation of the regression coefficient is that it gives the average change in the dependent variable for a unit change in the independent variable. The slope coefficient may be positive or negative, depending on the relationship between the dependent and the particular independent variable. Dr. C. Ertuna
Linear Regression Analysis A residual is the difference between the actual value of the dependent variable and the value predicted by the regression model. Dr. C. Ertuna
Linear Regression Analysis The least squares criterion is used for determining a regression line that minimizes the sum of squared residuals (SSE). Dr. C. Ertuna
Linear Regression Analysis Y 390 400 Sales in Thousands 300 312 200 Residual = 312 - 390 = -78 100 X 4 Years with Company Dr. C. Ertuna
Linear Regression Analysis ESTIMATED REGRESSION MODEL (SAMPLE MODEL) where: = Estimated, or predicted, y value b0 = Unbiased estimate of the regression intercept b1 = Unbiased estimate of the regression slope x = Value of the independent variable Dr. C. Ertuna
Least Squares Regression Properties • The sum of the residuals from the least squares regression line is 0. • The sum of the squared residuals is a minimum. • The simple regression line always passes through the mean of the y variable and the mean of the x variable. • The least squares coefficients are unbiased estimates of 0 and i . Dr. C. Ertuna
Linear Regression Analysis SUM OF RESIDUALS SUM OF SQUARED RESIDUALS Dr. C. Ertuna
Linear Regression Analysis TOTAL SUM OF SQUARES where: TSS = Total sum of squares n = Sample size y = Values of the dependent variable = Average value of the dependent variable Dr. C. Ertuna
Linear Regression Analysis SUM OF SQUARED ERROR (RESIDUALS) where: SSE = Sum of squared error n = Sample size y = Values of the dependent variable = Estimated value for the average of y for the given x value Dr. C. Ertuna
Linear Regression Analysis SUM OF SQUARES REGRESSION where: SSR = Sum of squares regression = Average value of the dependent variable y = Values of the dependent variable = Estimated value for the average of y for the given x value Dr. C. Ertuna
å - 2 ˆ ( y y ) i Linear Regression Analysis SUMS OF SQUARES Dr. C. Ertuna
Linear Regression Analysis The coefficient of determination is the portion of the total variation in the dependent variable that is explained by its relationship with the independent variable. The coefficient of determination is also called R-squared and is denoted as R2. Dr. C. Ertuna
Linear Regression Analysis COEFFICIENT OF DETERMINATION (R2) Dr. C. Ertuna
Regression Analysis COEFFICIENT OF DETERMINATION SINGLE INDEPENDENT VARIABLE CASE where: R2 = Coefficient of determination r = Simple correlation coefficient Dr. C. Ertuna
Linear Regression Analysis MEAN SQUARE REGRESSION where: SSR = Sum of squares regression k = Number of independent variables in the model Dr. C. Ertuna
Linear Regression Analysis MEAN SQUARE ERROR where: SSE = Sum of squares error n = Sample size k = Number of independent variables in the model Dr. C. Ertuna
Regression Steps • Develop a scatter plot of y and each of x’s. Check for linearity. • Compute the least squares regression line for the sample data (save residuals). • Run, Independence (if necessary), Normality, Equality of Variance tests on residuals. • Check significance of coefficients. • Check significance of overall regression. • Check importance of Coefficient of Determination. Dr. C. Ertuna
Running Regression on SPSS Analyze / Regression / Linear / • Method: Stepwise • Statistics: Estimates; Model Fit; Collinearity; Covariance Matrix Part and Partial Correlation; Descriptive Casewise diagnostics (3) • Save: Unstandardized Residuals Unstandardized Predicted Cook’s (Distances) Standardized DfBeta Dr. C. Ertuna
Residual Analysis Before using a regression model for description or prediction, you should do a check to see if the assumptions concerning the normal distribution, independence and constant variance of the error terms have been satisfied. Dr. C. Ertuna
Checking the Assumptions • There are assumptions that need to be met to accept the results of Regression analysis and use the model for future decision making: • Linearity • Independence of errors (No autocorrelation), • Normality of errors, • Constant Variance of errors. Dr. C. Ertuna
Tests for Linearity Linearity: • Plot dependent variable against each of the independent variables separately. • Decide whether linear regression is a “Reasonable” description of the tendency in the data. • Consider curvilinear pattern, • Consider undue influence of one data point on the regression line, etc. Dr. C. Ertuna
Tests for Independence Independence of Errors: (valid only for time series data) • Ljung-Box Test on Residuals Graphs / Time Series / Autocorrelations • If there is no spikes at any lag of Partial Autocorrelation Function (PAF) then the errors are independent. Dr. C. Ertuna
Tests for Normality Normal Distribution of Errors: • Shapiro-Wilk’s test on Residuals Analyze / Descriptive Statistics / Explore / Plots (check: Normality Plots with Tests) Dr. C. Ertuna
Tests for Constant Variance Checking Constant Variance of Errors in SPSS: • Create grouping variable for residuals: • Transform / Categorize Variables / {copy residuals into “Create Categories for” pane; in “Number of Categories” box insert [2] / Ok • The name of the new grouping variable is same as the variable name just an “n” attached in front of it. • Analyze / Descriptive Statistics / Explore {copy residuals into “Dependent List” pane; copy nresiduals into “Factor List” pane; click “Plots”, check “Untransformed” / Ok • Check the significance (Based on Means) in the Table “Test of Homogeneity of Variance” • If p-value >α Equality of Variance Dr. C. Ertuna
Tests for Constant Variance Checking Constant Variance of Errors in Excel: • Copy residuals into Excel • Divide residuals into two equal groups • Compute Standard Deviation and number of observations for each group • Name the group with largest StDev as Group-1 • Run 2-Sample 2-tailed Variance test PHStat / Two Sample Tests / F-test for Differences in two Variances 6. If p-value >α Equality of Variance Dr. C. Ertuna
Regression Results What is the Regression model? Do the independent variables have significant effect on dependent variable? Do the independent variables exhibit collinearity? Which independent variable has more influence on dependent variable? Data: Levine-K-B; Advertise Dr. C. Ertuna
Regression Results Is the overall regression model significant? Dr. C. Ertuna
How good is the explanatory power of the independent variables? Regression Results Dr. C. Ertuna
Correlation Table • Coefficient of Determination: R2= (a+b+c)/(a+b+c+e) • Part Correlation: ryx1.x2 = a/(a+e) • Partial Correlation: ryx1.x2 = a/(a+b+c+e) • Zero Order Correlation: ryx1 = (a+c)/(a+b+c+e) Dr. C. Ertuna
Next Lesson (Lesson - 06/B) Multiple Linear Regression Dr. C. Ertuna