Determining Factors of Market Success

Determining Factors of Market Success DMD #4 David Kopcso and Richard Cleary BabsonCollege F. W. Olin Graduate School of Business

Learning Objectives • Determine the strength of (linear) relationships • Describe a regression model with one or more explanatory variables • Interpret regression coefficients • Evaluate the model in a business context

Modeling Relationships • If we believe that two (or more) random variables are related, then we would like to model and exploit the relationship. • Hopefully, this helps to: • Make more accurate predictions • Show the direction and strength of relationship • Reduce the amount of uncertainty.

Approach Investigate variables individually and jointly. IndividuallyJointly Numerically: Standard Stats Correlation Graphically: Histogram Scatter Plot Box Plot

Scatter Plots • Positive Linear Relationship • Nonlinear Relationship • Negative Linear Relationship • No Relationship

Correlation Coefficient • Unit free: ranges between -1 and 1 • The closer to –1, the stronger the negative linear relationship • The closer to 1, the stronger the positive linear relationship • The closer to 0, the weaker any linear relationship • Note: Correlation does not deal with cause and effect; it only measures strength of linear dependence

Correlation Coefficient r • r = +1 • Y • Y • r = -1 • X • X • r = +0.9 • r = 0 • Y • Y • X • X

Correlation MeasuresOnly Linear Dependence! X Y = exp(X) 1 3 2 7 3 20 4 55 5 148 6 403 7 1097 8 2981 9 8103 10 22026 11 59874 12 162755 13 442413 14 1202604 15 3269017 16 8886111 17 24154953 18 65659969 19 178482301 20 485165195 • X and Y are perfectly related. However, the correlation of X and Y (where Y=exp(X) ) is 0.539

Linear RegressionModel • Assume that the relationship between the variables is linear: • Slope • Y-Intercept • Error • Y •  •  •  •  • X •  •  • 0 • 1 • i • Dependent (Response) Variable • Independent (Explanatory) Variable

Model Do you think knowing the size of a house helps “explain” the variation in house prices? Population Model: Price = b0 + b1 Sq. Footage + e Estimated Equation: Est. Price = b0 + b1 Sq. Footage^or Price = b0 + b1 Sq. Footage

Y •  • b •  • b • X •  • e • i • 0 • 1 • i • i • e • = Residual • i • ^ • Y •  • b •  • b • X • i • 0 • 1 • i Linear Regression Model • Y • Unsampled Observation • X

Estimated Model Est. Price = b0 + b1 Sq. Footage Est. Price = 117,663 + 173 Sq. Footage

Model Interpretation • b1: The average marginal increase/decrease in Price for a unit increase in Sq. Footage. • Price will increase by $173 on average for each additional square foot. • b0: The average Price when Square Footage equals zero. • Average value of Price is $117,663 when there is no Square Footage. • Does this statement make sense? Does this result have managerial significance?

Hypothesis Test: No Linear Relationship • Tests whether there is a (linear) relationship between X & Y • Hypotheses • H0: 1 = 0 (No Linear Relationship) • H1: 1 0 (Linear Relationship) • Compare p-value to a • Interpretation • If p-value is less than a, we have enough information to conclude that Square Footage is linearly related to Price and we can interpret the slope.

Quality of Model • We would like to know how well our model fits the facts (data). The better the fit, the more we believe in the model’s accuracy. • We have two measures of fit: R-squared and S (aka SEE).

2 • R The Famed R2 • Explained Variance •  • Coefficient of determination (R2) • The closer the R2 to 1, the better the “fit” R2 is the percentage of variation of the Y variable that is explained by (accounted for by or reduced by) knowing the X variable (i.e., by using the regression to predict the response rather than the average response value). • Square Footage explains 92% of the variation in Price. • Total Variance

Accuracy: Standard Error of Estimate • Standard error of the estimate: S (or SEE) • The smaller the S, the better the “fit” • The units of S are the same as the units of the Y variable. • When using our regression model for predicting home prices, we would be off on average plus/minus $46,631.

In-Class Activity: • Investigate models of Salary from the file Salary_handout.xls using only one variable as the explanatory (independent) variable. • Interpret the following in the context of the model: • Slope and intercept • Strength of linear relationship (R) • The usefulness of the slope (p-value) • Graph of relationship • Evaluation of the model, i.e., R2 and S. • Use the model to predict Salary for a fictitious employee.

Multiple Linear Regression(MLR) • We assume that the relationship between variables is linear: •  •  •  • Y •  •  •  •  • X •  • X •  • X •  • 2 • 3 • 0 • 1 • 3 • 1 • 2

Model Building • Before running any regressions or even any data analysis, determine which of your variables you believe are good predictors. • Generally, you want at least 10 observations per variable selected if possible.

Variable Investigation • Next, investigate the relationship between the response or dependent (Y) variable and each of the explanatory or independent (X) variables. Use the correlation matrix and scatter plots. • To avoid ‘problems’, also make sure that the correlation among the explanatory (independent) variables is not too high. As a rule of thumb, anything above 0.90 in absolute value can cause trouble.

Correlation Matrix

Multiple Regression Output • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms

Slopes in MLR • Be careful when interpreting the slopes in a multiple linear regression as it is necessary to hold all other variables constant. • Price will increase on average by $111 for each additional square foot when holding all other explanatory variables constant. If the p-value is not less than alpha, then you cannot interpret the slope.

p-values in MLR • Each explanatory (independent) variable has its own p-value. • When looked at individually, is the variable’s slope statistically different than zero? • If yes (p-value < a), then that variable is a good predictor within the context of the model and the slope can be interpreted. • If no (p-value > a), then that variable is not a good predictor within the context of the model and the slope can not be interpreted. Some variables are confirmatory and may remain in the model even though their p-value > a.

Coefficient of Determination R2 • R2 is still the percentage of variation of the Y variable explained by knowing all the X variables. The focus is on explaining the variation in Price, not on explaining the data.

Output • Knowing the square footage, the number of bedrooms and the number of bathrooms of a house, explains 97% of the variation in house prices.

Standard Error of Estimate in MLR • The interpretation of S (aka SEE) is the same in multiple regression as it is in simple. • Thus, we expect to be off on average plus or minus $28,765 when predicting house prices using the square footage, the number of bedrooms and the number of bathrooms in the house.

Variation Reduction • How do we know if the S (SEE) is low or high? Is it small enough to make the predictions from the regressions useful? Compare it to the standard deviation of the response (dependent) variable. S: (SEE) S: St Dev(Price) $28,765 vs. $161,666

Predicting Using Regression • Recall we have: • Assume this is a good equation. • Use it to predict the expected selling price of a home with 2000 sq. ft. of living space, 4 bedrooms, and 2 baths. • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms

How Confident Should You Be about Your Estimate? About two-thirds (68%) of the data should fall within +/- SEE of the value determined by the regression equation. Similarly about 95% should fall within 2*SEE. Therefore, a 95% interval for the prediction of a specific house at 533 Main St. which has2000 sq. ft., 4 bedrooms, & 2 baths can be computed as Est Price +/- 2*SEE.That is, we are 95% confident that this specific house’s price is between these two values.Since this is about a specific house, the interval is called a prediction interval not a confidence interval.

How Confident Should You Be about the Average Price of a Such a House? • A 95% confidence interval for the average price of a 2000 sq. ft., 4 bed, 2 bath house can be computed as: • Est Price +/- 2 *SEE/SQRT(n). • In words, based on our regression, we are 95% confident that the average 2000 sq. ft., 4 bed, 2 bath house price is between these two values. Since this is about an average of all such houses, the interval is called a confidence interval not a prediction interval.

In-Class Activity: • Investigate models of Salary from the file Salary_handout.xls using any set of variables you wish as the explanatory (independent) variables. • Interpret the following in the context of the model: • Slopes and intercept. • Strength of linear relationship (R) • The usefulness of the slope (p-value). • Graphs of relationship. • Evaluation of the model, i.e., R2 and SEE. • Use the model to predict Salary for a fictitious employee and build Prediction and Confidence intervals for this prediction.

Determining Factors of Market Success

Determining Factors of Market Success

Presentation Transcript

Feeder Cattle Value Determining Factors

Project Success Factors

Factors to Success

Feeder Cattle Value Determining Factors

Determining educational success

Factors of success

Factors of Online Campaign Success

Economic Sanctions: Factors of Success

Determining Factors of Newspaper Content

Determining Critical Factors of Six Sigma Initiatives

Critical success factors

Critical Success Factors

SUMMARY OF SUCCESS/FAILURE FACTORS

12 Success Factors

SAP Success Factors PPT | SAP Success Factors Learning

Factors Determining the Quality of a Limousine Service

Key Success Factors

Success Factors

Cosmetic Preservatives Market : Influential Factors Determining the Trajectory of the Market

Contact and Convective Dryers Market: Influential Factors Determining the Trajectory of the Market

Factors Determining ISO Certification Costs

Factors Determining The Success Of The TEM