140 likes | 281 Views
Quantitative Business Analysis for Decision Making. Multiple Linear Regression Analysis. Outlines. Multiple Regression Model Estimation Testing Significance of Predictors Multicollinearity Selection of Predictors Diagnostic Plots. Multiple Regression Model.
E N D
Quantitative Business Analysis for Decision Making Multiple Linear Regression Analysis
Outlines • Multiple Regression Model • Estimation • Testing Significance of Predictors • Multicollinearity • Selection of Predictors • Diagnostic Plots 403.8
Multiple Regression Model Multiple linear regression model: are slope coefficients of X1, X2 ,… ,Xk. quantifies the amount of change in response Y for a unit change in Xi when all other predictors are held fixed. 403.8
Multiple Regression Model (con’t) In the model, is the mean of Y. • Contributes to the variation in Y values from their mean , and • is assumed normally distributed with mean 0 and standard deviation 403.8
Sampling A random sample of n units is taken. Then for each unit k+1 measurements are made: Y, X1 , X2 , …., Xk 403.8
Estimated Model Estimated multiple regression model is: Expressions for bi are cumbersome to write. is an estimate of 403.8
Standard Error Sample standard deviation around the mean (estimated regression model) is: It is an estimate of Standard error of (for specified values of predictors) is denoted by 403.8
Testing Significance of a Predictor For comparing with a reference ,test statistic is: and for estimating by a confidence interval, compute 403.8
Coefficient of Determination Coefficient of determination R2 quantifies the % of variation in the Y-distribution that is accounted by the predictors in the model. If • R2 = 80%, then 20% variation in the Y-distribution is due to factors other than those in the model. • R2 increases as predictors are added in the model but at the cost of complicating it. 403.8
Testing the Model for Significance Null hypothesis = predictors in the relationship have no predictive power to explain the variation in Y-distribution Test statistic: F = . It has F- distribution with k and (n-k-1) degrees of freedoms for the numerator and denominator. 403.8
Multicollinearity and Selection of Predictors • Multicollinearity - occurs when predictors are highly correlated among themselves. In its presence R2 may be high, but individual coefficients are less reliable. • Screening process (e.g. stepwise regression) can eliminate multicollinearity by selecting only those predictors that are not strongly correlated among themselves. 403.8
Diagnostic Plots • Residuals are used to diagnose the validity of the model assumptions. • A scatter plot of the residuals against the predicted values can serve as a diagnostic tool. • A diagnostic plot can identify outliers, unequal variability, and need for transformation to achieve homogeneity etc. 403.8
Indicator Variables • Indicator variables (also called dummy variables) are numerical codes that are used to represent qualitative variables. • For example, 0 for men and 1 for women. • For a qualitative variable with c categories, (c-1) indicator variables need to be defined. 403.8