260 likes | 372 Views
BCOR 1020 Business Statistics. Lecture 24 – April 17, 2008. Overview. Chapter 12 – Linear Regression Visual Displays and Correlation Analysis Bivariate Regression Regression Terminology. Begin the analysis of bivariate data (i.e., two variables) with a scatter plot .
E N D
BCOR 1020Business Statistics Lecture 24 – April 17, 2008
Overview Chapter 12 – Linear Regression • Visual Displays and Correlation Analysis • Bivariate Regression • Regression Terminology
Begin the analysis of bivariate data (i.e., two variables) with a scatter plot. A scatter plot - displays each observed data pair (xi, yi) as a dot on an X/Y grid.- indicates visually the strength of the relationship between the two variables. Chapter 12 –Visual Displays Visual Displays:
Chapter 12 –Visual Displays Visual Displays: The price of Regular Unleaded appears to have a positively sloped linear relationship with the price of Diesel. These variables appear to be correlated.
The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y. -1 <r< +1 r = 0 indicates no linear relationship. Correlation functions are available in Excel, MegaStat and on your calculators. Strong negative relationship Strong positive relationship Chapter 12 –Correlation Analysis Correlation Analysis:
Chapter 12 –Correlation Analysis Correlation Analysis (Computing r): This value can be calculated on your calculator or using a software package like Excel or MegaStat.
Chapter 12 –Correlation Analysis Example: • Data Set for problem 12.3 (“CallWait”)… Y = Hold time (minutes) for concert tickets X = number of operators There appears to be “some” negative correlation between the variables. Does this make sense? We can calculate the sample correlation coefficient… r = -0.733 (overhead)
Strong Positive Correlation Chapter 12 –Correlation Analysis Weak Positive Correlation Weak Negative Correlation Strong Negative Correlation
Chapter 12 –Correlation Analysis No Correlation Nonlinear Relation
r is an estimate of the population correlation coefficient r (rho). To test the hypothesis H0: r = 0, the test statistic is: The critical value ta is obtained from Appendix D using n = n – 2 degrees of freedom for any a. We can bound the p-value for this test using the t table or we can find it exactly using Excel or MegaStat. Chapter 12 –Correlation Analysis Tests for Significance:
Equivalently, you can calculate the critical value for the correlation coefficient using This method gives a benchmark for the correlation coefficient. However, there is no p-value and is inflexible if you change your mind about a. Chapter 12 –Correlation Analysis Tests for Significance:
Step 1: State the HypothesesDetermine whether you are using a one or two-tailed test and the level of significance (a).H0: r = 0H1: r ≠ 0 Step 2: Calculate the Critical ValueFor degrees of freedom n = n -2, look up the critical value ta in Appendix D, then calculate Chapter 12 –Correlation Analysis Steps in Testing if r = 0: • Step 3: Make the DecisionIf the sample correlation coefficient r exceeds the critical value ra, then reject H0. • If using the t statistic method, reject H0 if t > ta or if the p-value <a.
Chapter 12 –Correlation Analysis Example: • In our earlier example on the data set “CallWait”, we calculated the sample correlation, r = -0.733, based on n = 5 data points. • Calculate the Critical Value, ra, to test the hypothesis H0: r = 0 vs. H1: r ≠ 0 at the 10% level of significance. • Since | r | is not greater than ra, we cannot reject H0. There is not a significant correlation between these variables at the 10 % level of significance.
Clickers For our example on the data set “CallWait”, we calculated the sample correlation, r = -0.733, based on n = 5 data points. Instead of calculating the Critical Value, ra, to test the hypothesis H0: r = 0 vs. H1: r ≠ 0, we could have calculated the test statistic What are the bounds for the p-value on this test statistic? (A) 0.10 < p-value < 0.20 (B) 0.025 < p-value < 0.05 (C) 0.05 < p-value < 0.10 t distribution with n = n-2 d.f. under H0.
As sample size increases, the critical value of r becomes smaller. This makes it easier for smaller values of the sample correlation coefficient to be considered significant. A larger sample does not mean that the correlation is stronger nor does its significance imply importance. Chapter 12 –Correlation Analysis Role of Sample Size:
Bivariate Regression analyzes the relationship between two variables. It specifies one dependent (response) variable and one independent (predictor) variable. This hypothesized relationship may be linear, quadratic, or whatever. Chapter 12 –Bivariate Regression What is Bivariate Regression?
Chapter 12 –Bivariate Regression Some Model Forms:
The intercept and slope of a fitted regression can provide useful information. For example, consider the fitted regression model… Sales(Y) = 268 + 7.37Ads(X) Each extra $1 million of advertising will generate $7.37 million of sales on average. The firm would average $268 million of sales with zero advertising. However, the intercept may not be meaningful because Ads = 0 may be outside the range of the observed data. Chapter 12 –Bivariate Regression Prediction Using Regression:
One of the main uses of regression is to make predictions. Once you have a fitted regression equation that shows the estimated relationship between X and Y, we can plug in any value of X to make a prediction for Y. Consider our example… Sales(Y) = 268 + 7.37Ads(X) If the firm spends $10 million on advertising, its expected sales would be… Sales(Y) = 268 + 7.37(10) = $341.7 million. Chapter 12 –Bivariate Regression Prediction Using Regression:
Unknown parameters that we will estimate areb0 = Interceptb1 = Slope The assumed model for a linear relationship is yi = b0 + b1xi + ei for all observations (i = 1, 2, …, n) The error term is not observable, but is assumed normally distributed with mean of 0 and standard deviation s. Chapter 12 –Regression Terminology Models and Parameters:
The fitted model used to predict the expected value of Y for a given value of X is yi = b0 + b1xi Chapter 12 –Regression Terminology Models and Parameters: ^ • The fitted coefficients areb0 the estimated interceptb1 the estimated slope • Residual is ei= yi - yi. • Residuals may be used to estimate s, the standard deviation of the errors. • We will discuss how b0 and b1 are found next lecture. ^
Step 1:- Highlight the data columns.- Click on the Chart Wizard and choose Scatter Plot- In the completed graph, click once on the points in the scatter plot to select the data - Right-click and choose Add Trendline- Choose Options and check Display Equation Chapter 12 –Regression Terminology Fitting a Regression on a Scatter Plot in Excel:
Chapter 12 –Regression Terminology Example: • Data Set for problem 12.3 (“CallWait”)… • Y = Hold time (minutes) for concert tickets • X = number of operators From this output, we have the linear model: y = 458 – 18.5x b0 = 458 b1 = -18.5 Discussion…
Clickers For our example on the data set “CallWait”, we have now calculated the regression model: Wait time (Y) = 458 – 18.5 Operators (X) If the there are 7 operators, what is the expected wait time? (A) 458 (B) 129.5 (C) 328.5 (D) 587.5
Chapter 12 –Regression Terminology Regression Caveats: • The “fit” of the regression does not depend on the sign of its slope. The sign of the fitted slope merely tells whether X has a positive or negative association with Y. • View the intercept with skepticism unless X = 0 is logically possible and was actually observed in the data set. • Be wary of extrapolating the model beyond the observed range in the data. • Regression does not demonstrate cause-and-effect between X and Y. A good fit shows that X and Y vary together. Both could be affected by another variable or by the way the data are defined.