1 / 34

Managerial Statistics, A Case Base Approach by: Klibanoff, Sandroni, Moselle, Saraniti

Chapter Seven. Objective: To identify and understand multicollinearity and its impact on multiple regression analysisTopics:MulticollinearityGeneralized F-testHidden ExtrapolationOmitted Variable BiasCase: The Hotdog Case. 2. Hot Dog: Background. Your company: Dubuque. Ball Park: a leading brand.Ball Park may reduce hot dog price. Problem: Impact on Dubuque's market share.Some argue that the impact will be small because Oscar Mayer is Dubuque's leading competitor..

gabi
Download Presentation

Managerial Statistics, A Case Base Approach by: Klibanoff, Sandroni, Moselle, Saraniti

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Summer 2009 Clayton State University School of Business Dr. Reza Kheirandish 1 Managerial Statistics, A Case Base Approach by: Klibanoff, Sandroni, Moselle, Saraniti

    2. Chapter Seven Objective: To identify and understand multicollinearity and its impact on multiple regression analysis Topics: Multicollinearity Generalized F-test Hidden Extrapolation Omitted Variable Bias Case: The Hotdog Case 2

    3. Hot Dog: Background Your company: Dubuque. Ball Park: a leading brand. Ball Park may reduce hot dog price. Problem: Impact on Dubuque’s market share. Some argue that the impact will be small because Oscar Mayer is Dubuque’s leading competitor. 3

    4. Hot Dog: More Background Ball Park produces two Hot Dogs. Regular and All-beef Hot Dogs. Current Prices: Ball Park 1.79 and 1.89 (regular and beef) Dubuque 1.49 Oscar Mayer 1.69 Ball Park new pricing: Regular 1.45, All-beef 1.55 4

    5. Hot Dog: Questions How does Dubuque’s own price affect Dubuque’s market share? How does Oscar Mayer’s price affect Dubuque’s market share? How does Ball Park’s price affect Dubuque’s market share? Who is Dubuque’s leading competitor, Ball Park or Oscar Mayer? 5

    6. Hot Dog: Further Questions What will happen to Dubuque’s market share if Dubuque does not respond to Ball Park’s new campaign? How much should Dubuque charge for its hot dog? 6

    7. Hot Dog: Regression 7

    8. Ball Park’s P-values 8

    9. P-values Do the high p-values indicate that neither of Ball Park’s prices has a significant relationship to Dubuque’s share? At first, you might think so, BUT the answer is NO! This regression is suffering from a serious multicollinearity problem 9

    10. Multicollinearity Multicollinearity is the term used to describe the correlation among the independent variables. A multicollinearity problem occurs when this correlation is high. 10

    11. Correlations Kstat can generate this correlations table Note the high correlation (0.97938) between the two Ball Park prices. They seem to be coordinating their pricing The following scatterplot shows this to be true ? ? ? 11

    12. Ball Park Prices 12

    13. P-values revisited Individually Ball Park’s hot dogs are not significant. This may suggest dropping them from the regression. This is NOT necessarily correct. The high correlation makes us unable to separate the two “Ball Park effects” on our market share. To decide, we must test whether Ball Park hot dogs taken together are significant. We can do this with an F-test. 13

    14. The F-test Ho: ?bpreg = ?bpbeef = 0 Ha: At least one coefficient ?bpreg or ?bpbeef (or both) is not equal to zero The t-tests run by Kstat question whether each independent variable is significantly different from zero in isolation. This F-test asks: Are they jointly signficant? 14

    15. The F-test First, click Statistics>Analysis of variance. You will obtain the following dialog box 15

    16. The F-test 16

    17. The F-test The p-value is small (0.0000003). We reject the null hypothesis: Ho: ?bpreg = ?bpbeef = 0 We accept the Alternative Hypothesis and conclude that at least one of the Ball Park hot dogs have an effect on Dubuque’s market share. 17

    18. How to Detect Multicollinearity Suppose that the correlation between two independent variables is 0.65 or 0.75. Is that a multicollinearity problem? The variance inflation factor is an indicator of a multicollinearity problem. If the VIF is above 10 then there is a serious multicollinearity problem. Kstat computes VIF automatically for you in the Model Analysis menu 18

    19. What Next? Let’s run a multiple regression without Ball Park (all-beef) hot dog. Just an experiment. NOT the final model. 19

    20. Regression without BP Beef 20

    21. What to do… Once we remove pbpbeef from the regression, the t-ratio of pbpreg skyrockets. The same would happen to the t-ratio of pbpbeef if we removed pbpreg. Keep the regression with BOTH hot dogs but interpret with care the p-values and coefficients 21

    22. Conclusion (even more) Ball Park is Dubuque’s leading competitor. Dubuque’s market share falls by an estimated 0.045% for each cent of decrease in both Ball Park’s prices. Dubuque’s market share expected to fall by 1.5% = 0.045% x 34. To maintain market share, Dubuque reduced price by 20 cents. (0.076% x 20 = 1.5%) 22

    23. Summary Ball Park’s prices are very correlated. This creates a multicollinearity problem. We cannot accurately estimate separate effects for the two Ball Park prices. But jointly they do have an effect. 23

    24. Other Issues The case also asks us to consider a new pricing strategy for Ball Park. They might be planning on charging 1.45 for the regular hot dogs and 1.95 for the all-beef version Can we just plug both of these values into Kstat’s prediction menu and make a forecast for our market share? Nope. 24

    25. Extrapolation We might want to check these new prices to see whether or not we are extrapolating. That is, does our data set include prices like these within it, or are we making estimates beyond the domain of our existing experience? Let’s start by looking at the univariate statistics in Kstat 25

    26. Univariate Statistics This looks okay. 145 falls between the min/max values for pbpreg 195 falls between the min/max values for pbpbeef Prices of 145, 195 are nothing new to us 26

    27. But Wait! Consider the Pair of Prices 27

    28. Hidden Extrapolation The X values we are considering are jointly far from the data set that we are using. We just wont see it by looking at them one at a time. Ball Park’s prices have always been within 10 cents of one another. They have never been 50 cents apart in the data set that we are using. Be aware of this possibility when testing sets of variables that are highly correlated. 28

    29. Omitted Variable Bias Multicollinearity creates a bias because of the variables that are INCLUDED in the regression Omitted variable bias is caused by the variables LEFT OUT of the regression 29

    30. Strike Outs 30

    31. Strike Outs: OVB Do more strike-outs lead to higher salaries? No But something is up. This isn’t spurious On average, players with more strike-outs DO make more money than others… They also hit more home runs and play more games and get more hits and … Omitting other variables biases the coefficient on strike-outs 31

    32. Strike-outs with Home runs added 32

    33. Two Interpretations of Two Coefficients The first regression (w/out Home runs) answers the following question: On average, how much does salary change for every strike-out? The second regression with both variables measures the effect of strike-outs on salary holding the number of home runs fixed. i.e. on average, for a player with a certain number of home runs, how much does salary change for every strike-out. 33

    34. Influence Diagram Strike-outs have a direct (negative) correlation with salary. Strike-outs also have a positive correlation with home runs which have a positive correlation with salary. This creates a second [indirect] effect which in this case dominates the direct one. 34

    35. Conclusions Omitted Variable Bias can distort coefficients Leaving out correlated variables forces the variables that are present to carry the weight of both direct and indirect effects Including all variables isolates their effect so coefficients only measure their direct effect Building models is hard. You cant include everything and so there will always be some OVB. Including some highly correlated variables can create other problems like multicollinearity 35

More Related