1.18k likes | 1.53k Views
Statistics for Business and Economics. Chapter 11 Multiple Regression and Model Building. Learning Objectives. Explain the Linear Multiple Regression Model Describe Inference About Individual Parameters Test Overall Significance Explain Estimation and Prediction
E N D
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building
Learning Objectives • Explain the Linear Multiple Regression Model • Describe Inference About Individual Parameters • Test Overall Significance • Explain Estimation and Prediction • Describe Various Types of Models • Describe Model Building • Explain Residual Analysis • Describe Regression Pitfalls
RegressionModels 1 Explanatory 2+ Explanatory Variable Variables Multiple Simple Non- Non- Linear Linear Linear Linear Types of Regression Models
RegressionModels 1 Explanatory 2+ Explanatory Variable Variables Multiple Simple Non- Non- Linear Linear Linear Linear Types of Regression Models
Multiple Regression Model • General form: • k independent variables • x1, x2, …, xk may be functions of variables • e.g. x2 = (x1)2
Regression Modeling Steps • Hypothesize deterministic component • Estimate unknown model parameters • Specify probability distribution of random error term • Estimate standard deviation of error • Evaluate model • Use model for prediction and estimation
Regression Modeling Steps • Hypothesize deterministic component • Estimate unknown model parameters • Specify probability distribution of random error term • Estimate standard deviation of error • Evaluate model • Use model for prediction and estimation
Assumptions for Probability Distribution of ε • Mean is 0 • Constant variance, σ2 • Normally Distributed • Errors are independent
Explanatory Variable 1 2 or More 1 Quantitative Quantitative Qualitative Variable Variables Variable 1st 2nd 3rd 1st Inter- 2nd Dummy Order Order Order Order Action Order Variable Model Model Model Model Model Model Model Types of Regression Models
Regression Modeling Steps • Hypothesize deterministic component • Estimate unknown model parameters • Specify probability distribution of random error term • Estimate standard deviation of error • Evaluate model • Use model for prediction and estimation
First–Order Multiple Regression Model Relationship between 1 dependent and 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Dependent (response) variable Independent (explanatory) variables
First-Order Model With 2 Independent Variables • Relationship between 1 dependent and 2 independent variables is a linear function • Model • Assumes no interaction between x1 and x2 • Effect of x1 on E(y) is the same regardless of x2 values
Population Multiple Regression Model Bivariate model: y (Observed y) b Response e 0 i Plane x2 x1 (x1i , x2i)
Sample Multiple Regression Model Bivariate model: y (Observed y) ^ b Response 0 ^ e Plane i x2 x1 (x1i , x2i)
E(y) = 1 + 2x1 + 3(3) = 10 + 2x1 E(y) = 1 + 2x1 + 3(2) = 7 + 2x1 E(y) = 1 + 2x1 + 3(1) = 4 + 2x1 E(y) = 1 + 2x1 + 3(0) = 1 + 2x1 No Interaction E(y) = 1 + 2x1 + 3x2 E(y) 12 8 4 0 x1 0 0.5 1 1.5 Effect (slope) of x1 on E(y) does not depend on x2 value
Regression Modeling Steps • Hypothesize Deterministic Component • Estimate Unknown Model Parameters • Specify Probability Distribution of Random Error Term • Estimate Standard Deviation of Error • Evaluate Model • Use Model for Prediction & Estimation
First-Order Model Worksheet Case, i yi x1i x2i 1 1 1 3 2 4 8 5 3 1 3 2 4 3 5 6 : : : : Run regression with y, x1, x2
Multiple Linear Regression Equations Too complicated by hand! Ouch!
^ • Slope (k) • Estimated y changes by k for each 1 unit increase in xkholding all other variables constant • Example: if 1 = 2, then sales (y) is expected to increase by 2 for each 1 unit increase in advertising (x1) given the number of sales rep’s (x2) ^ ^ ^ • Y-Intercept (0) • Average value of y when xk = 0 Interpretation of Estimated Coefficients
1st Order Model Example You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters. You’ve collected the following data: (y) (x1) (x2)RespSizeCirc 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 0.0640 0.2599 0.246 0.8214 ADSIZE 1 0.2049 0.0588 3.656 0.0399 CIRC 1 0.2805 0.0686 4.089 0.0264 ^ 0 ^ ^ 1 2 Parameter Estimation Computer Output
^ • Slope (1) • Number of responses to ad is expected to increase by .2049 (20.49) for each 1 sq. in. increase in ad size holding circulation constant ^ • Slope (2) • Number of responses to ad is expected to increase by .2805 (28.05) for each 1 unit (1,000) increase in circulationholding adsize constant Interpretation of Coefficients Solution
Regression Modeling Steps • Hypothesize Deterministic Component • Estimate Unknown Model Parameters • Specify Probability Distribution of Random Error Term • Estimate Standard Deviation of Error • Evaluate Model • Use Model for Prediction & Estimation
Estimation of σ2 For a model with k independent variables
Calculating s2 and s Example You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find SSE, s2, and s.
Analysis of Variance Source DF SS MS F PRegression 2 9.249736 4.624868 55.44 .0043 Residual Error 3 .250264 .083421Total 5 9.5 SSE S2 Analysis of Variance Computer Output
Regression Modeling Steps • Hypothesize Deterministic Component • Estimate Unknown Model Parameters • Specify Probability Distribution of Random Error Term • Estimate Standard Deviation of Error • Evaluate Model • Use Model for Prediction & Estimation
Evaluating Multiple Regression Model Steps • Examine variation measures • Test parameter significance • Individual coefficients • Overall model • Do residual analysis
Evaluating Multiple Regression Model Steps • Examine variation measures • Test parameter significance • Individual coefficients • Overall model • Do residual analysis
Multiple Coefficient of Determination • Proportion of variation in y ‘explained’ by all x variables taken together • Never decreases when new x variable is added to model • Only y values determine SSyy • Disadvantage when comparing models
Adjusted Multiple Coefficient of Determination • Takes into account n and number of parameters • Similar interpretation to R2
Estimation of R2 and Ra2 Example You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find R2 and Ra2.
R2 Ra2 Excel Computer OutputSolution
Evaluating Multiple Regression Model Steps • Examine variation measures • Test parameter significance • Individual coefficients • Overall model • Do residual analysis
df = n – (k + 1) Inference for an Individual β Parameter • Confidence Interval • Hypothesis Test Ho: βi= 0 Ha: βi≠ 0 (or < or > ) • Test Statistic
Confidence Interval Example You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find a 95% confidence interval for β1.
Hypothesis Test Example You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Test the hypothesis that the mean ad response increases as circulation increases (ad size constant). Use α = .05.
2 = 0 2> 0 .05 6 - 3 = 3 Reject H0 .05 t 0 2.353 Hypothesis Test Solution • H0: • Ha: • • df • Critical Value(s): Test Statistic: Decision: Conclusion:
2 = 0 2> 0 .05 6 - 3 = 3 Reject H0 .05 t 0 2.353 Hypothesis Test Solution • H0: • Ha: • • df • Critical Value(s): Test Statistic: Decision: Conclusion: Reject at = .05 There is evidence the mean ad response increases as circulation increases