730 likes | 886 Views
Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer. Chapter 11 Regression Analysis. Doing Statistics for Business. Chapter 11 Objectives
E N D
DoingStatistics for BusinessData, Inference, and Decision MakingMarilyn K. PelosiTheresa M. Sandifer Chapter 11 Regression Analysis
DoingStatistics for Business Chapter 11 Objectives • Find the linear regression equation for a dependent variable Y as a function of a single independent variable X • Determine whether a relationship between X and Y exists • Analyze the results of a regression analysis to determine whether the simple linear model is appropriate
DoingStatistics for Business Figure 11.1 Deterministic Relationship Between Total Order Cost and Number of Items Ordered
DoingStatistics for Business Figure 11.2 Statistical Relationship Between Revenue and Advertising Expenditures
DoingStatistics for Business TRY IT NOW! Increasing Capacity Plotting Data to Look at the Relationship An oil company is trying to determine how the number of refining sites available for refining crude oil relates to the overall refining capacity. It would use this information to determine whether or not expansion will provide the increase in capacity that it wants or whether others steps to increase capacity will be necessary. The company collects data on other competitive companies and finds the following:
DoingStatistics for Business TRY IT NOW! Increasing Capacity Plotting Data to Look at the Relationship (con’t)
DoingStatistics for Business TRY IT NOW! Increasing Capacity Plotting Data to Look at the Relationship (con’t) Use a grid to create a scatter plot of of the data. Do you think that a linear model is a good one?
Doing Statistics for Business The true relationship between the variables X and Y, the Simple Linear Regression Model, can be described by the equation y = 0 + 1x +
Doing Statistics for Business Figure 11.3 The True Regression Model Showing how Y Variesfor a Given Value of X
Doing Statistics for Business Figure 11.4 Straight Line Approximating the Relationship Between Advertising and Revenue
Doing Statistics for Business Figure 11.5 A Single Criterion Can Produce Many Different Lines
Doing Statistics for Business The distance between the predicted value of Y, and the actual value of Y, , is called the deviation or error.
Doing Statistics for Business Figure 11.6 Deviations Between the Data Points and the Line
Doing Statistics for Business The technique that finds the equation of the line that minimizes the total or sum of the squared deviations between the actual data points and the line is called the least squares method.
DoingStatistics for Business TRY IT NOW! Increasing Capacity Finding the Equation of the Least- Squares Regression Line The oil company that is looking at increasing refining capacity has decided that a linear relationship is appropriate. Fill in the table shown on the following slide or use some other means to find the equation of the least-squares line:
DoingStatistics for Business TRY IT NOW! Increasing Capacity Finding the Equation of the Least- Squares Regression Line (con’t)
DoingStatistics for Business TRY IT NOW! Increasing Capacity Finding the Equation of the Least- Squares Regression Line(con’t) Interpret the meaning of the estimate of the slope of the line. Does the y intercept make sense for these data?
Doing Statistics for Business The value of that we find is really a prediction of the mean value of Y for a given value of X.
Doing Statistics for Business Using the equation to predict values of Y within the range of the X data is called interpolation. Predicting values of for values of X outside the observed range is called extrapolation.
DoingStatistics for Business TRY IT NOW! Increasing Capacity Using the Regression Equation to Predict the Value of Y Use the equation of the regression line you found earlier to predict the refining capacity for each of the observed values of X, the number of sites.
DoingStatistics for Business TRY IT NOW! Increasing Capacity Using the Regression Equation to Predict the Value of Y (con’t)
Doing Statistics for Business The difference between the observed value of Y (y), and the predicted value of Y from the regression equation ( i), for a value of X = x, is called the ith residual, ei.
DoingStatistics for Business TRY IT NOW! Increasing Capacity Calculating the Residuals The oil company that is looking at the relationship between refining capacity and the number of refining sites wants to get a better idea of how the regression line relates to the actual data. It decides to calculate the residuals for each observed value of X, the number of sites. Find the residuals and fill in the table found on the following slide:
DoingStatistics for Business TRY IT NOW! Increasing Capacity Calculating the Residuals (con’t)
DoingStatistics for Business TRY IT NOW! Increasing Capacity Calculating the Residuals (con’t) To get a picture of how the residuals and the regression line fit together, the company also decides to graph the regression line on a plot of the data. Graph the regression line on the data plot. How well do you think the line represents the data?
DoingStatistics for Business TRY IT NOW! Increasing Capacity Calculating the Residuals (con’t)
Doing Statistics for Business The standard error of the estimate, syx is a measure of how muchthe data vary around the regression line.
Doing Statistics for Business Figure 11.8 Computer Output Showing the Standard Error of the Estimate Excel Output Minitab Output
Doing Statistics for Business (a) (b) Figure 11.9 (a) Line with non-zero slope (b) Line with zero slope
Doing Statistics for Business Figure 11.10 t-test Portion of Computer Output Minitab Excel
DoingStatistics for Business TRY IT NOW! Increasing Capacity Testing for Significance of the Regression Model The oil company that is looking at increasing capacity wants to determine whether the relationship between refining capacity and number of refining sites that it calculated is significant. Write down the hypotheses that the company needs to test.
DoingStatistics for Business TRY IT NOW! Increasing Capacity Testing for Significance of the Regression Model (con’t) The company decides to use a 0.01 level of significance for the test. Find the critical values for the test. It used a computer software package to run the analysis and obtained the following output:
DoingStatistics for Business TRY IT NOW! Increasing Capacity Testing for Significance of the Regression Model (con’t) From the computer output, find the slope of the regression line, the standard error of the slope, and the value of the t statistic. Perform the hypothesis test and make a decision about the regression line.
DoingStatistics for Business TRY IT NOW! Increasing Capacity Testing for Significance of the Regression Model (con’t) Find the p value of the test from the output and explain how you could use the p value on the output to make the same decision. Once we have determined that the relationship between X and Y is significant, we can perform some additional analyses to see if the predictions we obtain are useful for the purposes of decision making and to determine the strength of the relationship.
DoingStatistics for Business Figure 11.11 Components of the Variation in y Value
DoingStatistics for Business Excel Output Minitab Output Figure 11.12 Computer ANOVA Output for Regression Analysis
A Confidence Interval provides an estimate for the mean value of Y (yx) at a particular value of X. Doing Statistics for Business
DoingStatistics for Business Figure 11.13 Confidence Interval for the Mean Estimate
DoingStatistics for Business TRY IT NOW! Increasing Capacity Finding Confidence Intervals for the Mean Predicted Value After calculating the regression model and deciding that the model is significant, the analysts at the oil company would like to know about the accuracy of the estimates from the model. They decide to calculate 95% confidence intervals for X = 8 and 13 sites. They know from previous work that for the set of 10 observations in the model, syx = 13.43, x = 78, and x 2 = 714.
DoingStatistics for Business TRY IT NOW! Increasing Capacity Finding Confidence Intervals for the Mean Predicted Value (con’t) Find 95% confidence intervals for the mean estimates. Do you think that these estimates would be useful for planning purposes? Why or why not?
A Prediction Interval gives an estimate for an individual value of Y at a particular value of X. Doing Statistics for Business
DoingStatistics for Business TRY IT NOW! Increasing Capacity Calculating Prediction Intervals for Regression Estimates The oil company analysts decide to calculate 95% prediction intervals for the two X values that they are interested in. The relevant values from the set of 10 observations are syx = 13.43, x = 78, and x 2 = 714. Find 95% prediction intervals for X = 8 and X = 13 refining sites.
DoingStatistics for Business TRY IT NOW! Increasing Capacity Calculating Prediction Intervals for Regression Estimates (con’t) Do you think that confidence intervals or planning intervals would be more appropriate for the oil company’s purpose?
The Correlation Coefficient is used as a measure of the strength of a linear relation- ship. A correlation of – 1 corresponds to a perfect negative relationship, a correlation of 0 corresponds to no relationship, and a correlation of +1 corresponds to a perfect positive relationship. Doing Statistics for Business
Doing Statistics for Business Perfect Negative Perfect Positive No Relationship Figure 11.14 3 Types of Relationships: Perfect Negative, No Relationship, and Perfect Positive
DoingStatistics for Business TRY IT NOW! Increasing Capacity Calculating the Correlation Coefficient The relevant data to calculate the correlation coefficient for the oil company problem are n = 10 x = 78 y = 463.654 xy = 4121.86 x 2 = 714 y2 = 25,359.3224 Find the correlation coefficient for the data.
DoingStatistics for Business Figure 11.15 Examples of Residual Plots
DoingStatistics for Business Figure 11.16 Histograms of Residuals
A Normal Probability Plot is a plot of the ordered data against their expected values under a normal distribution. When data are normally distributed, the plot will be a straight line. Doing Statistics for Business
DoingStatistics for Business Figure 11.17 Regression Diagnostic Plots