1 / 25

Multiple Regression

frayne
Download Presentation

Multiple Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding more variables will help us to explain more variance - the trick becomes: are the additional variables significant and do they improve the overall model? Additionally, the added independent variables should not be too highly related with each other!

    2. Multiple Regression A sample data set: Sales= hundreds of gallons Price = price per gallon Advertising = hundreds of dollars

    3. Analyzing the output Evaluate for multicollinearity State and interpret the equation Interpret Adjusted R2 Interpret Syx Are the independent variables significant? Is the model significant Forecast and develop prediction interval Examine the error terms Calculate MAD, MSE, MAPE, MPE

    4. Correlation Matrix Simple correlation for each combination of variables (independents vs. independents; independents vs. dependent)

    5. Multicollinearity Its possible that the independent variables are related to one another. If they are highly related, this condition is called multicollinearity. Problems: A regression coefficient that is positive in sign in a two-variable model may change to a negative sign Estimates of the regression coefficient change greatly from sample to sample because the standard error of the regression coefficient is large. Highly interrelated independent variable can explain some of the same variance in the dependent variable - so there is no added benefit, even though the R-square has increased. We would throw one variable out - high correlation (.7)

    6. Multiple Regression Equation Gallon Sales = 16.4 - 8.2476 (Price) + .59 (Adv)

    7. Regression Coefficients bo is the Y-intercept - the value of sales when X1 and X2 are 0. b1 and b2 are net regression coefficients. The change in Y per unit change in the relevant independent variable, holding the other independent variables constant.

    8. Regression Coefficients For each unit increase ($1.00) in price, sales will decrease 8.25 hundred gallons, holding advertising constant. For each unit increase ($100, represented as 1) in Advertising, sales will increase .59 hundred gallons, holding price constant. Be very careful about the units! 10 in the advertising indicates $1,000 because advertising is in hundreds Gallons = 16.4 - 8.2476 (1.00) + .59 (10) = 14.06 or 1,406 Gallons

    9. Regression Coefficients How does a one cent increase in price affect sales (holding advertising at $1,000)? 16.4-8.25(1.01)+.59(10) = 13.9675 If price stays $1.00, and increase advertising $100, from $1,000 to $1100: 16.4-8.25(1.00)+.59(11) = 14.65

    10. Regression Statistics Standard error of the estimate R2 and Adjusted R2

    11. R2 and Adjusted R2 Same formulas as Simple Regression SSR/SST (this is an UNADJUSTED R2 ) Adjusted R2 from ANOVA = 1-MSR/(SST/n-1) 91% of the variance in gallons sold is explained by price per gallon and advertising.

    12. Standard Error of the Estimate Measures the standard amount that the actual values (Y) differ from the estimated values . No change in formula, except, in this example, k=3. Can still use square root of MSE

    13. Evaluate the Independent Variables Ho: The regression coefficient is not significantly different from zero HA: The regression coefficient is significantly different from zero Use the t-stat and the --value to evaluate EACH independent variable. If an independent variable is NOT significant, we remove it from the model and re-run!

    14. Evaluate the Model Ho: The model is NOT valid and there is NOT a statistical relationship between the dependent and independent variables HA: The model is valid. There is a statistical relationship between the dependent and independent variables. If F from the ANOVA is greater than the F from the F-table, reject Ho: The model is valid. We can look at the P-values. If the p-value is less than our set a level, we can REJECT Ho.

    15. Forecast and Prediction Interval Same as simple regression - however, many times we will not have the correction factor (formula under the square root). It is acceptable to use the Standard error of the estimate provided in the computer output.

    16. Examining the Errors Heteroscedasticity exists when the residuals do not have a constant variance across an entire range of values. Run an autocorrelation on the error terms to determine if the errors are random. If the errors are not random, the model needs to be re-evaluated. More on this in Chapter 9. Evaluate with MAD, MAPE, MPE, MSE

    17. Dummy Variables Used to determine the relationship between qualitative independent variables and a dependent variable. Differences based on gender Effect of training/no-training on performance Seasonal data- quarters We use 0 and 1 to indicate off or on. For example, code males as 1 and females as 0.

    18. Dummy Variables The data indicates job performance rating based on achievement test score and female (0) and males (1). How do males and females differ in their job performance?

    19. Dummy Variables The regression equation: Job performance = -1.96 +.12 (test score) -2.18 (gender) Holding gender constant, a one unit increase in test score increases job performance rating by 1.2 points. Holding test score constant, males experience a 2.18 point lower performance rating than females. Or stated differently, females have a 2.18 higher job performance than males, holding test scores constant.

    20. Dummy Variable Analysis Evaluate for multicollinearity State and interpret the equation Interpret Adjusted R2 Interpret Syx Are the independent variables significant? Is the model significant Forecast and develop prediction interval Examine the error terms Calculate MAD, MSE, MAPE, MPE

    21. Model Evaluation If the variables indicate multicollinearity, run the model, interpret, but then re-run the best model (I.e. throw out one of the highly correlated variables) If one of the independent variables are NOT significant, (whether dummy variable or other) throw it out and re-run the model If the overall model is not significant - back to the drawing board - need to gather better predictor variables maybe an elective course!

    22. Stepwise Regression Sometimes, we will have a great number of variables - running a correlation matrix will help determine if any variables should NOT be in the model (low correlation with the dependent variable). Can also run different types of regression, such as stepwise regression

    23. Stepwise regression Adds one variable at a time - one step at a time. Based on explained variance (and highest correlation with the dependent variable). The independent variable that explains the most variance in the dependent variable is entered into the model first. A partial f-test is determined to see if a new variable stays or is eliminated.

    24. Start with the correlation Matrix

    25. Stepwise Regression

    26. Stepwise Regression The equation at Step1: Sales = -100.85 + 6.97 (age) The equation at Step2: Sales = -86.79 + 5.93 (age) + .200 (test score) No other variables are significant; the model stops.

More Related