580 likes | 778 Views
Chapter 12. Multiple Regression Analysis and Model Building. Chapter 12 - Chapter Outcomes. After studying the material in this chapter, you should be able to: Understand the general concepts behind model building using multiple regression analysis.
E N D
Chapter 12 Multiple Regression Analysis and Model Building
Chapter 12 - Chapter Outcomes After studying the material in this chapter, you should be able to: Understand the general concepts behind model building using multiple regression analysis. Apply multiple regression analysis to business, decision-making situations. Analyze the computer output for a multiple regression model and test the significance of the independent variables in the model.
Chapter 12 - Chapter Outcomes(continued) After studying the material in this chapter, you should be able to: Recognize potential problems when using multiple regression analysis and take the steps to correct the problems. Incorporate qualitative variables into the regression model by using dummy variables.
Multiple Regression Analysis SIMPLE LINEAR REGRESSION MODEL (POPULATION MODEL) where: y = Value of the dependent variable x = Value of the independent variable = Population’s y-intercept = Slope of the population regression line = Error term, or residual
Multiple Regression Analysis ESTIMATED SIMPLE LINEAR REGRESSION MODEL where: b0 = Estimated y intercept b1 = Estimated slope coefficient
Multiple Regression Analysis A residual or prediction error is the difference between the actual value of y and the predicted value of y.
Multiple Regression Analysis The standard error of the estimate refers to the standard deviation of the model errors. The standard error measures the dispersion of the actual values of the dependent variable around the fitted regression plane.
Multiple Regression Analysis MULTIPLE REGRESSION MODEL (POPULATION MODEL) where: = Population’s regression constant = Population’s regression coefficient for variable j; j=1, 2, … k k =Number of independent variables = Model error
Multiple Regression Analysis ESTIMATED MULTIPLE REGRESSION MODEL
Multiple Regression Analysis A model is a representation of an actual system using either a physical or mathematical portrayal.
Model Specification • Decide what you want to do and select the dependent variable. • List the potential independent variables for your model. • Gather the sample data (observations) for all variables.
Multiple Regression Analysis The correlation coefficient is a quantitative measure of the strength of the linear relationship between two variables. The correlation coefficient, r, ranges between -1.0 and +1.0.
Multiple Regression Analysis CORRELATION COEFFICIENT One x variable with y or
Multiple Regression Analysis CORRELATION COEFFICIENT One x variable with another x
Multiple Regression Analysis(Example) Multiple Regression Model: House Characteristics: x1 = Square feet = 2,100; x2 = Age = 15; x3 = Number of Bedrooms = 4; x4 = Number of baths = 3; x5 = Size of garage = 2 Point Estimate for Sale Price:
Coefficient of Determination MULTIPLE COEFFICIENT OF DETERMINATION The percentage of variation in the dependent variable explained by the independent variable in the regression model:
Model Diagnosis • Is the overall model significant? • Are the individual variables significant? • Is the standard deviation of the model error too large to provide meaningful results? • Is multicollinearity a problem?
Is the Model Significant? If the null hypothesis is true, the overall regression model is not useful for predictive purposes.
Is the Model Significant? F-TEST STATISTIC where: SSR = Sum of squares regression SSE = Sum of squares error n = Number of data points k = Number of independent variables Degrees of freedom = D1 = k and D2 = n - k - 1
Is the Model Significant? ADJUSTED R-SQUARED A measure of the percentage of explained variation in the dependent variable that takes into account the relationship between the number of cases and the number of independent variables in the regression model. where: n = Number of data points k = Number of independent variables
Are the Individual Variables Significant? t-TEST FOR SIGNIFICANCE OF EACH REGRESSION COEFFICIENT where: bi = Sample slope coefficient for the ith independent variable sbi= Estimate of the standard error for the ith sample slope coefficient n-k-1 = Degrees of freedom
Are the Individual Variables Significant? (From Figure 12-7) /2 = 0.01 /2 = 0.01 Decision Rule: If -2.364 t 2.364, accept H0 Otherwise, reject H0
Are the Individual Variables Significant? (From Figure 12-7)
Is the Standard Deviation of the Regression Model Too Large? ESTIMATE FOR THE STANDARD DEVIATION OF THE MODEL where: SSE = Sum of squares error n = Sample size k = Number of independent variables
Is Multicollinearity A Problem? Multicollinearity refers to the situation when high correlation exists between two independent variables. This means the two variables contribute redundant information to the multiple regression model. When highly correlated independent variables are included in the regression model, they can adversely affect the regression results.
Some Indications of Severe Multicollinearity • Incorrect signs on the coefficients. • A sizable change in the values of the previous coefficients when a new variable is added to the model. • A variable the previously significant in the model becomes insignificant when a new independent variable is added. • The estimate of the standard deviation of the model increases when a variable is added to the model.
Is Multicollinearity A Problem? The variance inflation factor is a measure of how much the variance of an estimated regression coefficient increases if the independent variables are correlated. A VIF equal to one for a given independent variable indicates that this independent variable is not correlated with the remaining independent variables in the model. The greater the multicollinearity, the larger the VIF will be.
Is Multicollinearity A Problem? VARIANCE INFLATION FACTOR where: Rj2 = Coefficient of determination when the jth independent variable is regressed against the remaining k - 1 independent variables.
Multiple Regression Analysis CONFIDENCE INTERVAL FOR THE REGRESSION COEFFICIENT where: bi = Point estimate for the regression coefficient xi t/2= Critical t-value for a 1 - confidence interval sbi= The standard error of the ith regression coefficient
Multiple Regression Analysis(Example from Figure 12-9) $55.16 $70.97
Using Qualitative Independent Variables A dummy variable is a variable that is assigned a value equal to 0 or 1 depending on whether the observation possesses a given characteristic or not.
Using Qualitative Independent Variables (Example 12-2) Dummy Variable: Estimated Regression:
Using Qualitative Independent Variables (Example 12-2) If No MBA: If MBA:
Using Qualitative Independent Variables(Figure 12-11) MBAs Non-MBAs b2 = 35,236 = Regression coefficient on the dummy variable
Nonlinear Relationships Exponential Relationship of Increased Demand for Electricity versus Population Growth Electricity Demand Population
Nonlinear Relationships Diminishing Returns Relationship of Advertising versus Sales Sales Advertising
Nonlinear Relationships POLYNOMIAL POPULATION REGRESSION MODEL where: 0 = Population’s regression constant i = Population’s regression coefficient for variable xj : j = 1, 2, …k p = Order of the polynomial i = Model error
Nonlinear Relationships(Figure 12-21) R2 = 0.7272
Nonlinear Relationships Interaction refers to the case in which one independent variable (such as x2) affects the relationship between another independent variable (x1) and a dependent variable (y).
Nonlinear Relationships A composite model is the model that contains both the basic terms and the interactive terms.
Nonlinear Relationships A Composite Model Basic Terms Interactive Terms
Stepwise Regression Stepwise regression refers to a method which develops the least squares regression equation in steps, either through forward selection, backward elimination, or through standard stepwise regression.
Stepwise Regression The coefficient of partial determination is the measure of the marginal contribution of each independent variable, given that other independent variables are in the model.
Best Subsets Regression Cp STATISTIC where: p = k(Number of independent variables in model) + 1 T = 1 + The total number of independent variables to be considered for inclusion in the model Rp2 = Coefficient of multiple determination for the model with p = k parameters RT2 = Coefficient of multiple determination for the model that contains all T parameters
Analysis of Residuals The following problems can be inferred through graphical analysis of residuals: • The regression function is not linear. • The model errors do not have a constant variance. • The model errors are not independent. • The model errors are not normally distributed.
Analysis of Residuals RESIDUAL