680 likes | 932 Views
Too complicated by hand!. Department of Business Administration. Chapter 4: Regression Analysis. FALL 20 11 - 2012. Outline: What You Will Learn. Purpose of r egression a nalysis Simple l inear r egression m odel Overall significance concept- F-test
E N D
Too complicated by hand! Department of Business Administration Chapter 4: Regression Analysis FALL 2011-2012
Outline: What You Will Learn . . . • Purpose of regression analysis • Simple linear regression model • Overall significance concept- F-test • Individual significance concept- t-test • Coefficient of determination and correlation coefficient • Confident interval • Multiple regression Model • Compare and contrast simple linear regression analysis and multiple regression Analysis
Purpose ofRegression Analysis • Regression Analysis is Used Primarily to Model Causality and Provide Prediction • Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable • Explain the effect of the independent variables on the dependent variable • The relationship between X and Y can be shown on a scatter diagram
Scatter Diagram • It is two dimensional graph of plotted points in which the vertical axis represents values of the dependent variable and the horizontal axis represents values of the independent or explanatory variable. • The patterns of the intersecting points of variables can graphically show relationship patterns. • Mostly, scatter diagram is used to prove or disprove cause-and-effect relationship. In the following example, it shows the relationship between advertising expenditure and its sales revenues.
Scatter Diagram Scatter Diagram-Example
Scatter Diagram • Scatter diagram shows a positive relationship between the relevant variables. The relationship is approximately linear. • This gives us a rough estimates of the linear relationship between the variables in the form of an equation such as • Y= a+ b X
Regression Analysis • In the equation, a is the vertical intercept of the estimated linear relationship and gives the value of Y when X=0, while b is the slope of the line and gives an estimate of the increase in Y resulting from each unit increase in X. • The difficulty with the scatter diagram is that different researchers would probably obtain different results, even if they use same data points. Solution for this is to use regression analysis.
Regression Analysis • Regression analysis: is a statistical technique for obtaining the line that best fits the data points so that all researchers can reach the same results. • Regression Line: Line of Best Fit • Regression Line: Minimizes the sum of the squared vertical deviations (et) of each point from the regression line. • This is the method called Ordinary Least Squares (OLS).
Regression Analysis • In the table, Y1 refers actual or observed sales revenue of $44 mn associated with the advertising expenditure of $10 mn in the first year for which data collected. • In the following graph, Y^1is the corresponding sales revenue of the firm estimated from the regression line for the advertising expenditure of $10 mn in the first year. • The symbol e1 is the corresponding vertical deviation or error of the actual sales revenue estimated from the regression line in the first year. This can be expressed as e1= Y1- Y^1.
Regression Analysis • In the graph, Y^1is the corresponding sales revenue of the firm estimated from the regression line for the advertising expenditure of $10 mn in the first year. • The symbol e1 is the corresponding vertical deviation or error of the actual sales revenue estimated from the regression line in the first year. This can be expressed as e1= Y1- Y^1.
Regression Analysis • Since there are 10 observation points, we have obviously 10 vertical deviations or error (i.e., e1 to e10). The regression line obtained is the line that best fits the data points in the sense that the sum of the squared (vertical) deviations from the line is minimum. This means that each of the 10 e values is first squared and then summed.
Simple Regression Analysis • Now we are in a position to calculate the value of a ( the vertical intercept) and the value of b (the slope coefficient) of the regression line. • Conduct tests of significance of parameter estimates. • Construct confidence interval for the true parameter. • Test for the overall explanatory power of the regression.
Simple Linear Regression Model Regression line is a straight line that describes the dependence of the average value of one variable on the other SlopeCoefficient Random Error Y Intercept Dependent (Response) Variable Independent (Explanatory) Variable Regression Line
Ordinary Least Squares (OLS) Model:
Ordinary Least Squares (OLS) Objective: Determine the slope and intercept that minimize the sum of the squared errors.
Ordinary Least Squares (OLS) Estimation Procedure
Ordinary Least Squares (OLS) Estimation Example
Ordinary Least Squares (OLS) Estimation Example
The Equation of Regression Line • The equation of the regression line can be constructed as follows: • Yt^=7.60 +3.53 Xt • When X=0 (zero advertising expenditures), the expected sales revenue of the firm is $7.60 mn. In the first year, when X=10mn, Y1^= $42.90 mn. • Strictly speaking, the regression line should be used only to estimate the sales revenues resulting from advertising expenditure that are within the range.
Tests of Significance: Standard Error • To test the hypothesis that b is statistically significant (i.e., advertising positively affects sales), we need first of all to calculate standard error (deviation) of b^. • The standard error can be calculated in the following expression:
Tests of Significance Standard Error of the Slope Estimate
Tests of Significance Example Calculation Yt^=7.60 +3.53 Xt =7.60+3.53(10)= 42.90
Tests of Significance Example Calculation
Tests of Significance Calculation of the t Statistic Degrees of Freedom = (n-k) = (10-2) = 8 Critical Value (tabulated) at 5% level =2.306
Confidence interval • We can also construct confidence interval for the true parameter from the estimated coefficient. • Accepting the alternative hypothesis that there is a relationship between X and Y. • Using tabular value of t=2.306 for 5% and 8 df in our example, the true value of b will lies between 2.33 and 4.73 • t=b^+/- 2.306 (sb^)=3.53+/- 2.036 (0.52)
Tests of Significance Decomposition of Sum of Squares Total Variation = Explained Variation + Unexplained Variation
Tests of Significance Decomposition of Sum of Squares
Coefficient of Determination-R2 • Coefficient of Determination: is defined as the proportion of the total variation or dispersion in the dependent variable that explained by the variation in the explanatory variables in the regression. • In our example, COD measures how much of the variation in the firm’s sales is explained by the variation in its advertising expenditures.
Tests of Significance Coefficient of Determination
Coefficient of Correlation-r • Coefficient of Correlation (r): The square root of the coefficient of determination. • This is simply a measure of the degree of association or co-variation that exists between variables X and Y. • In our example, this mean that variables X and Y vary together 92% of the time. • The sign of coefficient r is always the same as the sign of coefficient of b^.
Tests of Significance Coefficient of Correlation
Simple Linear Regression: Example 1 noconst.Plast.Permit Ship. 1 156 2 9 4 3 40 16 4 20 6 5 25 13 6 25 9 7 15 10 8 35 16 The general manager of building materials production plant feels the demand for plasterboard shipments may be related to the number of construction permits issued in a town during the previous quarter.The maneger has collected the data shown in the table.
Questions a. Derive a regression forecasting equation b. Determine a point estimate for plasterboard shipment when the number of construction permits 30. c. Compute the standard error of regression. d. Compute the upper and lower limits for consruction when it is 30. e. Compute the correlation coefficient, and determination of coefficient. Briefly interpret both. f. Use t-test whether t-value of estimated b is significant at 5% level of significance.
Answer C-D There is 95 % probability that shipment for 30 permits will fall between 12.84 and 13.13 shipments and the rest will fall outside of these limits.
Simple Linear Regression: Example 2 You wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best. Annual Store Square Sales Feet ($1000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760
Scatter Diagram: Example Excel Output
Simple Linear Regression Equation: Example From Excel Printout:
Graph of the Simple Linear Regression Equation: Example Yi = 1636.415 +1.487Xi
Interpretation of Results: Example The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units. The equation estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487.
Crucial Assumptions • Error term is normally distributed. • Error term has zero expected value or mean. • Error term has constant variance in each time period and for all values of X. • Error term’s value in one time period is unrelated to its value in any other period.
Multiple Regression Analysis Model:
Relationship between 1 dependent & 2 or more independent variables is alinear function Multiple Regression Analysis Y-intercept Slopes Random error Dependent (Response) variable Independent (Explanatory) variables
Multiple Regression Model: Example Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches.
Multiple Regression Model: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.