1 / 35

Determining Factors of Market Success

Determining Factors of Market Success. DMD #4 David Kopcso and Richard Cleary Babson College F. W. Olin Graduate School of Business. Learning Objectives. Determine the strength of (linear) relationships Describe a regression model with one or more explanatory variables

Download Presentation

Determining Factors of Market Success

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Determining Factors of Market Success DMD #4 David Kopcso and Richard Cleary BabsonCollege F. W. Olin Graduate School of Business

  2. Learning Objectives • Determine the strength of (linear) relationships • Describe a regression model with one or more explanatory variables • Interpret regression coefficients • Evaluate the model in a business context

  3. Modeling Relationships • If we believe that two (or more) random variables are related, then we would like to model and exploit the relationship. • Hopefully, this helps to: • Make more accurate predictions • Show the direction and strength of relationship • Reduce the amount of uncertainty.

  4. Approach Investigate variables individually and jointly. IndividuallyJointly Numerically: Standard Stats Correlation Graphically: Histogram Scatter Plot Box Plot

  5. Scatter Plots • Positive Linear Relationship • Nonlinear Relationship • Negative Linear Relationship • No Relationship

  6. Correlation Coefficient • Unit free: ranges between -1 and 1 • The closer to –1, the stronger the negative linear relationship • The closer to 1, the stronger the positive linear relationship • The closer to 0, the weaker any linear relationship • Note: Correlation does not deal with cause and effect; it only measures strength of linear dependence

  7. Correlation Coefficient r • r = +1 • Y • Y • r = -1 • X • X • r = +0.9 • r = 0 • Y • Y • X • X

  8. Correlation MeasuresOnly Linear Dependence! X Y = exp(X) 1 3 2 7 3 20 4 55 5 148 6 403 7 1097 8 2981 9 8103 10 22026 11 59874 12 162755 13 442413 14 1202604 15 3269017 16 8886111 17 24154953 18 65659969 19 178482301 20 485165195 • X and Y are perfectly related. However, the correlation of X and Y (where Y=exp(X) ) is 0.539

  9. Linear RegressionModel • Assume that the relationship between the variables is linear: • Slope • Y-Intercept • Error • Y •  •  •  •  • X •  •  • 0 • 1 • i • Dependent (Response) Variable • Independent (Explanatory) Variable

  10. Model Do you think knowing the size of a house helps “explain” the variation in house prices? Population Model: Price = b0 + b1 Sq. Footage + e Estimated Equation: Est. Price = b0 + b1 Sq. Footage^or Price = b0 + b1 Sq. Footage

  11. Y •  • b •  • b • X •  • e • i • 0 • 1 • i • i • e • = Residual • i • ^ • Y •  • b •  • b • X • i • 0 • 1 • i Linear Regression Model • Y • Unsampled Observation • X

  12. Estimated Model Est. Price = b0 + b1 Sq. Footage Est. Price = 117,663 + 173 Sq. Footage

  13. Model Interpretation • b1: The average marginal increase/decrease in Price for a unit increase in Sq. Footage. • Price will increase by $173 on average for each additional square foot. • b0: The average Price when Square Footage equals zero. • Average value of Price is $117,663 when there is no Square Footage. • Does this statement make sense? Does this result have managerial significance?

  14. Hypothesis Test: No Linear Relationship • Tests whether there is a (linear) relationship between X & Y • Hypotheses • H0: 1 = 0 (No Linear Relationship) • H1: 1 0 (Linear Relationship) • Compare p-value to a • Interpretation • If p-value is less than a, we have enough information to conclude that Square Footage is linearly related to Price and we can interpret the slope.

  15. Quality of Model • We would like to know how well our model fits the facts (data). The better the fit, the more we believe in the model’s accuracy. • We have two measures of fit: R-squared and S (aka SEE).

  16. 2 • R The Famed R2 • Explained Variance •  • Coefficient of determination (R2) • The closer the R2 to 1, the better the “fit” R2 is the percentage of variation of the Y variable that is explained by (accounted for by or reduced by) knowing the X variable (i.e., by using the regression to predict the response rather than the average response value). • Square Footage explains 92% of the variation in Price. • Total Variance

  17. Accuracy: Standard Error of Estimate • Standard error of the estimate: S (or SEE) • The smaller the S, the better the “fit” • The units of S are the same as the units of the Y variable. • When using our regression model for predicting home prices, we would be off on average plus/minus $46,631.

  18. In-Class Activity: • Investigate models of Salary from the file Salary_handout.xls using only one variable as the explanatory (independent) variable. • Interpret the following in the context of the model: • Slope and intercept • Strength of linear relationship (R) • The usefulness of the slope (p-value) • Graph of relationship • Evaluation of the model, i.e., R2 and S. • Use the model to predict Salary for a fictitious employee.

  19. Multiple Linear Regression(MLR) • We assume that the relationship between variables is linear: •  •  •  • Y •  •  •  •  • X •  • X •  • X •  • 2 • 3 • 0 • 1 • 3 • 1 • 2

  20. Model Building • Before running any regressions or even any data analysis, determine which of your variables you believe are good predictors. • Generally, you want at least 10 observations per variable selected if possible.

  21. Variable Investigation • Next, investigate the relationship between the response or dependent (Y) variable and each of the explanatory or independent (X) variables. Use the correlation matrix and scatter plots. • To avoid ‘problems’, also make sure that the correlation among the explanatory (independent) variables is not too high. As a rule of thumb, anything above 0.90 in absolute value can cause trouble.

  22. Correlation Matrix

  23. Multiple Regression Output • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms

  24. Slopes in MLR • Be careful when interpreting the slopes in a multiple linear regression as it is necessary to hold all other variables constant. • Price will increase on average by $111 for each additional square foot when holding all other explanatory variables constant. If the p-value is not less than alpha, then you cannot interpret the slope.

  25. p-values in MLR • Each explanatory (independent) variable has its own p-value. • When looked at individually, is the variable’s slope statistically different than zero? • If yes (p-value < a), then that variable is a good predictor within the context of the model and the slope can be interpreted. • If no (p-value > a), then that variable is not a good predictor within the context of the model and the slope can not be interpreted. Some variables are confirmatory and may remain in the model even though their p-value > a.

  26. Coefficient of Determination R2 • R2 is still the percentage of variation of the Y variable explained by knowing all the X variables. The focus is on explaining the variation in Price, not on explaining the data.

  27. Output • Knowing the square footage, the number of bedrooms and the number of bathrooms of a house, explains 97% of the variation in house prices.

  28. Standard Error of Estimate in MLR • The interpretation of S (aka SEE) is the same in multiple regression as it is in simple. • Thus, we expect to be off on average plus or minus $28,765 when predicting house prices using the square footage, the number of bedrooms and the number of bathrooms in the house.

  29. Variation Reduction • How do we know if the S (SEE) is low or high? Is it small enough to make the predictions from the regressions useful? Compare it to the standard deviation of the response (dependent) variable. S: (SEE) S: St Dev(Price) $28,765 vs. $161,666

  30. Predicting Using Regression • Recall we have: • Assume this is a good equation. • Use it to predict the expected selling price of a home with 2000 sq. ft. of living space, 4 bedrooms, and 2 baths. • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms

  31. How Confident Should You Be about Your Estimate? About two-thirds (68%) of the data should fall within +/- SEE of the value determined by the regression equation. Similarly about 95% should fall within 2*SEE. Therefore, a 95% interval for the prediction of a specific house at 533 Main St. which has2000 sq. ft., 4 bedrooms, & 2 baths can be computed as Est Price +/- 2*SEE.That is, we are 95% confident that this specific house’s price is between these two values.Since this is about a specific house, the interval is called a prediction interval not a confidence interval.

  32. How Confident Should You Be about the Average Price of a Such a House? • A 95% confidence interval for the average price of a 2000 sq. ft., 4 bed, 2 bath house can be computed as: • Est Price +/- 2 *SEE/SQRT(n). • In words, based on our regression, we are 95% confident that the average 2000 sq. ft., 4 bed, 2 bath house price is between these two values. Since this is about an average of all such houses, the interval is called a confidence interval not a prediction interval.

  33. In-Class Activity: • Investigate models of Salary from the file Salary_handout.xls using any set of variables you wish as the explanatory (independent) variables. • Interpret the following in the context of the model: • Slopes and intercept. • Strength of linear relationship (R) • The usefulness of the slope (p-value). • Graphs of relationship. • Evaluation of the model, i.e., R2 and SEE. • Use the model to predict Salary for a fictitious employee and build Prediction and Confidence intervals for this prediction.

More Related