850 likes | 1.05k Views
Statistics for Business and Economics. Chapter 11 Multiple Regression and Model Building. Learning Objectives. Explain the Linear Multiple Regression Model Describe Inference About Individual Parameters Test Overall Significance Explain Estimation and Prediction
E N D
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building
Learning Objectives • Explain the Linear Multiple Regression Model • Describe Inference About Individual Parameters • Test Overall Significance • Explain Estimation and Prediction • Describe Various Types of Models • Describe Model Building • Explain Residual Analysis • Describe Regression Pitfalls
RegressionModels 1 Explanatory 2+ Explanatory Variable Variables Multiple Simple Non- Non- Linear Linear Linear Linear Types of Regression Models
Multiple Regression Model • General form: • k independent variables • x1, x2, …, xk may be functions of variables • e.g. x2 = (x1)2
Regression Modeling Steps • Hypothesize deterministic component • Estimate unknown model parameters • Specify probability distribution of random error term • Estimate standard deviation of error • Evaluate model • Use model for prediction and estimation
First–Order Multiple Regression Model Relationship between 1 dependent and 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Dependent (response) variable Independent (explanatory) variables
First-Order Model With 2 Independent Variables • Relationship between 1 dependent and 2 independent variables is a linear function • Model • Assumes no interaction between x1 and x2 • Effect of x1 on E(y) is the same regardless of x2 values
Population Multiple Regression Model Bivariate model: y (Observed y) b Response e 0 i Plane x2 x1 (x1i , x2i)
Sample Multiple Regression Model Bivariate model: y (Observed y) ^ b Response 0 ^ e Plane i x2 x1 (x1i , x2i)
Regression Modeling Steps • Hypothesize Deterministic Component • Estimate Unknown Model Parameters • Specify Probability Distribution of Random Error Term • Estimate Standard Deviation of Error • Evaluate Model • Use Model for Prediction & Estimation
Multiple Linear Regression Equations Too complicated by hand! Ouch!
1st Order Model Example You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters. You’ve collected the following data: (y) (x1) (x2)RespSizeCirc 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 See ResponsesVsAdsizeAndCirculationData.jmp
^ 0 ^ ^ 1 2
^ • Slope (1) • Number of responses to ad is expected to increase by .2049 (20.49) for each 1 sq. in. increase in ad size holding circulation constant ^ • Slope (2) • Number of responses to ad is expected to increase by .2805 (28.05) for each 1 unit (1,000) increase in circulationholding adsize constant Interpretation of Coefficients Solution
Regression Modeling Steps • Hypothesize Deterministic Component • Estimate Unknown Model Parameters • Specify Probability Distribution of Random Error Term • Estimate Standard Deviation of Error • Evaluate Model • Use Model for Prediction & Estimation
Estimation of σ2 For a model with k predictors (k+1 parameters)
SSE s2 s More About JMP Output (also called “standard error of the regression”) (also called “mean squared error”)
Regression Modeling Steps • Hypothesize Deterministic Component • Estimate Unknown Model Parameters • Specify Probability Distribution of Random Error Term • Estimate Standard Deviation of Error • Evaluate Model • Use Model for Prediction & Estimation
Evaluating Multiple Regression Model Steps • Examine variation measures • Test parameter significance • Individual coefficients • Overall model • Do residual analysis
df = n – (k + 1) Inference for an Individual β Parameter • Confidence Interval (rarely used in regression) • Hypothesis Test (used all the time!) Ho: βi= 0 Ha: βi≠ 0 (or < or > ) • Test Statistic (how far is the sample slope from zero?)
Easy way: Just examine p-values Both coefficients significant! Reject H0 for both tests
Testing Overall Significance • Shows if there is a linear relationship between allx variables together and y • Hypotheses • H0: 1 = 2 = ... = k = 0 • No linear relationship • Ha: At least one coefficient is not 0 • At least one x variable affects y
Testing Overall Significance • Test Statistic • Degrees of Freedom1 = k2 = n – (k + 1) • k = Number of independent variables • n = Sample size
Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 9.2497 4.6249 55.440 0.0043 Error 3 0.2503 0.0834 C Total 5 9.5000 k Testing Overall SignificanceComputer Output MS(Model) n – (k + 1) MS(Error) P-value
k Testing Overall SignificanceComputer Output n – (k + 1) MS(Model) MS(Error) P-value
Explanatory Variable 1 2 or More 1 Quantitative Quantitative Qualitative Variable Variables Variable 1st 2nd 3rd 1st Inter- 2nd Dummy Order Order Order Order Action Order Variable Model Model Model Model Model Model Model Types of Regression Models
Contains two-way cross product terms Interaction Model With 2 Independent Variables • Hypothesizes interaction between pairs of x variables • Response to one x variable varies at different levels of another x variable • Can be combined with other models • Example: dummy-variable model
E(y) = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1 E(y) = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1 Interaction Model Relationships E(y) = 1 + 2x1 + 3x2 + 4x1x2 E(y) 12 8 4 x1 0 0 0.5 1 1.5 Effect (slope) of x1 on E(y) depends on x2 value
Interaction Example You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct a test for interaction. Use α = .05.
Adding Interactions in JMP is Easy Analyze >> Fit Model Click on the response variable and click the Y button Highlight the two X variables and click on the Add button While the two X variables are highlighted, click on the Cross button Run Model You can also combine steps 3 and 4 into one step: Highlight the two X variables and, from the “Macros” pull down menu, chose “Factorial to Degree.” The default for degree is 2, so you will get all two-factor interactions in the model.
JMP Interaction Output Interaction not important: p-value > .05
Explanatory Variable 1 2 or More 1 Quantitative Quantitative Qualitative Variable Variables Variable 1st 2nd 3rd 1st Inter- 2nd Dummy Order Order Order Order Action Order Variable Model Model Model Model Model Model Model Types of Regression Models
Curvilinear effect Linear effect Second-Order Model With 1 Independent Variable • Relationship between 1 dependent and 1 independent variable is a quadratic function • Useful 1st model if non-linear relationship suspected • Model
Second-Order Model Relationships 2 > 0 2 > 0 y y x1 x1 2 < 0 2 < 0 y y x1 x1
Types of Regression Models Linear (First order) ^ ^ Y X i 0 1 i Quadratic (Second order) ^ ^ ^ 2 Y X X i 0 1 2 i i Cubic (Third order) ^ ^ ^ ^ 3 2 X Y X X 3 i i 0 1 2 i i
2nd Order Model Example The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2nd order model, conduct the global F–test, and test if β2 ≠ 0. Use α = .05 for all tests.
Analyze >> Fit Y by X From hot spot menu choose: Fit Polynomial >> 2, quadratic Could also use: Analyze >> Fit Model, select Y, then highlight X and, from the “Macros” pull down menu, chose “Polynomial to Degree.” The default for degree is 2, so you will get the quadratic (2nd order) polynomial. But from Fit Model, you won’t get the cool fitted line plot.
Explanatory Variable 1 2 or More 1 Quantitative Quantitative Qualitative Variable Variables Variable 1st 2nd 3rd 1st Inter- 2nd Dummy Order Order Order Order Action Order Variable Model Model Model Model Model Model Model Types of Regression Models
Second-Order (Response Surface) Model With 2 Independent Variables • Relationship between 1 dependent and 2 independent variables is a quadratic function • Useful 1st model if non-linear relationship suspected • Model
4 + 5 > 0 4 + 5 < 0 y y x2 x1 y 32 > 4 45 Second-Order Model Relationships x2 x1 x2 x1
From JMP: To specify the model, all you need to do is: Analyze >> Fit Model Highlight the X variables From the “Macros” pull down menu, chose “Response Surface.” The default for degree is 2, so you will get the full second-order model having all squared terms and all cross products.
Explanatory Variable 1 2 or More 1 Quantitative Quantitative Qualitative Variable Variables Variable Qualitative 1st 2nd 3rd 1st Inter- 2nd Order Order Order Order Action Order Variable Model Model Model Model Model Model Model Types of Regression Models
Qualitative-Variable Model • Involves categorical x variable with 2 levels • e.g., male-female; college-no college • Variable levels coded 0 and 1 • Number of dummy variables is 1 less than number of levels of variable • May be combined with quantitative variable (1st order or 2nd order model)