1 / 60

Section 12.1

Section 12.1. Scatter Plots and Correlation. With the quality added value you’ve come to expect from D.R.S., University of Cordele. HAWKES LEARNING SYSTEMS math courseware specialists. Regression, Inference, and Model Building 12.1 Scatter Plots and Correlation.

levana
Download Presentation

Section 12.1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 12.1 Scatter Plots and Correlation With the quality added value you’ve come to expect from D.R.S., University of Cordele

  2. HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.1 Scatter Plots and Correlation Plot (x,y) data points and think about whether x and y are somehow related Types of Relationships: Strong Linear Relationship Weak Linear Relationship No Relationship Non-Linear Relationship

  3. Table More

  4. Table (cont.) Source: Yahoo! Sports. “NFL - Statistics by Position.” http://sports.yahoo.com/nfl/stats/byposition?pos=QB&conference=NFL&year=season_20 11&sort=49&timeframe=All (20 May 2012). Source: Spotrac.com. “NFL Player Contracts, Salaries, and Transactions.” http://www.spotrac.com/nfl/ (2 Oct. 2012).

  5. Example 12.1: Creating a Scatter Plot to Identify Trends in Data Use the data from Table 12.2 to produce a scatter plot that shows the relationship between the base salary of an NFL quarterback and the number of touchdowns the quarterback has thrown in one season. Solution We might expect for the number of touchdowns a quarterback throws in one season to influence his salary. Taking this into consideration, we will place the number of touchdowns on the x-axis and the base salary on the y-axis.

  6. Scatter Plot of (touchdowns, salary) on TI-84 Touchdowns in list L1, Salary in list L2 Y= old algebra plots should be cleared out of there 2nd STAT PLOT all should be “Off” to start with 1:Plot 1: On, choose Type, Lists L1 and L2, Mark • Remember 2ND 1, 2ND 2 to put in list names? ZOOM 9:ZoomStat If unexplainable error, 2ND MEM 7 1 2 to clear all and then retype the lists of data. TRACE and Left Arrow and Right Arrow to explore it

  7. Example 12.1: Creating a Scatter Plot to Identify Trends in Data (cont.)

  8. Example 12.1: Creating a Scatter Plot to Identify Trends in Data (cont.) Looking at this scatter plot, we do not see a linear pattern. Actually, no pattern is evident. This probably indicates that these two variables do not have a relationship after all.

  9. Example 12.2: Creating a Scatter Plot to Identify Trends in Data Use the data in Table 12.2 to produce a scatter plot that shows the relationship between the number of touchdowns thrown in one season and the corresponding quarterback rating for the given sample of NFL quarterbacks. Solution In this case, we would expect that the number of touchdowns thrown by a quarterback does influence that quarterback’s rating, since number of touchdowns is one of many factors used to determine the quarterback rating.

  10. Scatter Plot of (touchdowns, rating) on TI-84 Ratings in list L3, Touchdowns still in L1, Salary in L2 2nd STAT PLOT 1:Plot 1: Change to Lists L1 and L3 ZOOM 9:ZoomStat TRACE and Left Arrow and Right Arrow to explore it

  11. Example 12.2: Creating a Scatter Plot to Identify Trends in Data (cont.) Hence, the logical way to label the axes is to place the number of passing touchdowns on the x-axis and the quarterback rating on the y-axis.

  12. Example 12.2: Creating a Scatter Plot to Identify Trends in Data (cont.) Notice that the points tend to go up from left to right, and fall close to a straight line. This pattern can be described as a linear pattern with a positive slope.

  13. Example 12.3: Determining Whether a Scatter Plot Would Have a Positive Slope, Negative Slope, or Not Follow a Straight-Line Pattern Determine whether the points in a scatter plot for the two variables are likely to have a positive slope, negative slope, or not follow a straight-line pattern. a. The number of hours you study for an exam and the score you make on that exam b. The price of a used car and the number of miles on the odometer c. The pressure on a gas pedal and the speed of the car d. Shoe size and IQ for adults

  14. Example 12.3: Determining Whether a Scatter Plot Would Have a Positive Slope, Negative Slope, or Not Follow a Straight-Line Pattern (cont.) Solution a. As the number of hours you study for an exam increases, the score you receive on that exam is usually higher. Thus, the scatter plot would have a positive slope. b. As the number of miles on the odometer of a used car increases, the price usually decreases. Thus, the scatter plot would have a negative slope.

  15. Example 12.3: Determining Whether a Scatter Plot Would Have a Positive Slope, Negative Slope, or Not Follow a Straight-Line Pattern (cont.) c. The more you push on the gas pedal, the faster the car will go. Thus, the scatter plot would have a positive slope. d. Common sense suggests that there is not a relationship, linear or otherwise, between a person’s IQ and his or her shoe size.

  16. Scatter Plots and Correlation The Pearson correlation coefficient, , is the parameter that measures the strength of a linear relationship between two quantitative variables in a population. The correlation coefficient for a sample is denoted by r. It always takes a value between −1 and 1, inclusive.

  17. Question: “Are x and y related?” ρ (Greek letter rho) is the population parameter for the Correlation Coefficient r (our alphabet’s letter r) is the sample statistic for the Correlation Coefficient We use our sample r to estimate the population’s parameter ρ

  18. HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.1 Scatter Plots and Correlation • –1 ≤ r ≤ 1 • Close to –1 means a strong negative correlation. • Close to 0 means no correlation. • Close to 1 means a strong positive correlation.

  19. Scatter Plots and Correlation Pearson Correlation Coefficient The Pearson correlation coefficientfor paired data from a sample is given by where n is the number of data pairs in the sample, xi is the ith value of the explanatory variable, and yi is the ith value of the response variable.

  20. Example 12.4: Calculating the Correlation Coefficient Using a TI-83/84 Plus Calculator Calculate the correlation coefficient, r, for the data from Table 12.2 relating touchdowns thrown and base salaries. Solution The data we need from Table 12.2 are reproduced in the following table. But we will not dig into the details of that awful formula! The TI-84 has built-in goodies.

  21. Example 12.4: Calculating the Correlation Coefficient Using a TI-83/84 Plus Calculator (cont.)

  22. Use TI-84 LinRegTTest The next few slides describe the use of LinRegTTest. It’s STAT, TESTS, ALPHA F (ALPHA E on the 83/Plus) This description is about the full hypothesis test to determine “Is the relationship significant?” The outputs include the value of r, the correlation coefficient, which is of greatest interest at this early point in our study. The Hawkes materials talk about the LinReg feature but I’m recommending the LinRegTTest instead because you get more information for about the same effort.

  23. Hypothesis Test for significant Null Hypothesis: “No relationship” Alternative: “But there IS a significant relationship!” There’s some level of significance specified in advance, like or It involves calculating a value and finding “what is the -value of this ?” And if -value < , reject the null hypothesis • If so, then we say “Yes, significant relationship!”

  24. Hypothesis Test for significant Usually we do this two-tailed test: • Null Hypothesis : “No relationship” • Alternative Hypothesis: , “There is a significant linear relationship.” Be aware of a couple one-tailed variations: • Test for significant POSITIVE correlation only:using and • Test for significant NEGATIVE correlation only:using and

  25. LinRegTTest inputs (not identical to the quarterback example!) • Here are the inputs: • Xlist and Ylist – where you put the data • Shortcut: 2ND 2 puts L2 • Freq: 1 (unless…) • β & ρ: ≠ 0 • This is the Alternative Hypothesis • RegEq: VARS, right arrow to Y-VARS, 1, 1 • Just put it in for later • Highlight “Calculate” • Press ENTER

  26. LinRegTTest Outputs, first screen(from a different problem) • t= the t statistic value for this test (the formula is in the book) • p = the p-value for this t test statistic • in this kind of a test • later – for regression

  27. LinRegTTest Outputs, second screen (from a different problem) • b later, for Regression • s much later, for advanced Regression • r2 = how much of the output variable (weight) is explained by the input variable (girth) • r = the correlation coefficient for the sample • Close to – strong positive relationship • Or – strong negative

  28. Testing the Correlation Coefficient for Significance Using Critical Values of the Pearson Correlation Coefficient to Determine the Significance of a Linear Relationship A sample correlation coefficient, r, is statistically significant if (Why is this discussion here? Sometimes they give you a shred of a problem that gives some summary results and you have to use a printed table to make the determination. That’s the only time you’ll need to do this, for a few of those kinds of problems. In “real life”, in large problems, the LinRegTTest p-value is compared to alpha.)

  29. Example 12.6: Using a Table of Critical Values to Determine Significance of a Linear Relationship Use the critical values in Table I to determine if the correlation between the number of passing touchdowns and base salary from Example 12.4 is statistically significant. Use a 0.05 level of significance. Solution Begin by finding the critical value for  = 0.05 with n = 10 in Table I. Find the value in the table where the row for n = 10 intersects the column for  = 0.05.

  30. Example 12.6: Using a Table of Critical Values to Determine Significance of a Linear Relationship (cont.) INTERPRETATION: “If my sample’s correlation coefficient, r, is at least as big as the value you look up in this table, then YES, significant linear relationship. Otherwise, no, no significant linear relationship.”

  31. Example 12.6: Using a Table of Critical Values to Determine Significance of a Linear Relationship (cont.) Thus, r= 0.632. Comparing this critical value to the absolute value of the correlation coefficient we found for the data in Example 12.4, we have 0.251 < 0.632, and thus  r  < r. Therefore, the linear relationship between the variables is not statistically significant at the 0.05 level of significance. Thus, we do not have sufficient evidence, at the 0.05 level of significance, to conclude that a linear relationship exists between the number of passing touchdowns during the 2011–2012 season and the 2012 base salary of an NFL quarterback.

  32. Testing the Correlation Coefficient for Significance Using Hypothesis Testing Testing Linear Relationships for Significance Significant Linear Relationship (Two-Tailed Test) H0:  = 0 (Implies that there is no significant linear relationship) Ha:  ≠ 0 (Implies that there is a significant linear relationship) (Now they’re getting into the Hypothesis Testing we saw a brief preview of earlier in this set of slides.)

  33. Testing the Correlation Coefficient for Significance Using Hypothesis Testing Testing Linear Relationships for Significance (cont.) Significant Negative Linear Relationship (Left-Tailed Test) H0:  ≥ 0 (Implies that there is no significant negative linear relationship) Ha:  < 0 (Implies that there is a significant negative linear relationship)

  34. Testing the Correlation Coefficient for Significance Using Hypothesis Testing Testing Linear Relationships for Significance (cont.) Significant Positive Linear Relationship (Right-Tailed Test) H0:  ≤ 0 (Implies that there is no significant positive linear relationship) Ha:  > 0 (Implies that there is a significant positive linear relationship)

  35. Testing the Correlation Coefficient for Significance Using Hypothesis Testing Test Statistic for a Hypothesis Test for a Correlation Coefficient The test statistic for testing the significance of the correlation coefficient is given by TI-84 LinRegTTest will calculate this value for us.

  36. Testing the Correlation Coefficient for Significance Using Hypothesis Testing Test Statistic for a Hypothesis Test for a Correlation Coefficient (cont.) where r is the sample correlation coefficient and n is the number of data pairs in the sample. The number of degrees of freedom for the t-distribution of the test statistic is given by n- 2.

  37. Testing the Correlation Coefficient for Significance Using Hypothesis Testing Rejection Regions for Testing Linear Relationships Significant Linear Relationship (Two-Tailed Test) Reject the null hypothesis, H0 , if Significant Negative Linear Relationship (Left-Tailed Test) Reject the null hypothesis, H0 , if Significant Positive Linear Relationship (Right-Tailed Test) Reject the null hypothesis, H0 , if

  38. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant Use a hypothesis test to determine if the linear relationship between the number of parking tickets a student receives during a semester and his or her GPA during the same semester is statistically significant at the 0.05 level of significance. Refer to the data presented in the following table.

  39. Example 12.7 Use the TI-84 LinRegTTest to perform the hypothesis test. Use the p-value method: The LinRegTTest gives you a p-value. If the p-value is < the given Level of Significance α = 0.05, then REJECT the null hypothesis; conclude that there IS a significant linear relationship. Otherwise, Fail To Reject – no significant relationship. And you can disregard most or all of the by-hand detail that follows in the next few slides.

  40. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Solution Step 1: State the null and alternative hypotheses. We wish to test the claim that a significant linear relationship exists between the number of parking tickets a student receives during a semester and his or her GPA during the same semester. Thus, the hypotheses are stated as follows.

  41. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Step 2: Determine which distribution to use for the test statistic, and state the level of significance. We will use the t-test statistic presented previously in this section along with a significance level of = 0.05 to perform this hypothesis test. Step 3: Gather data and calculate the necessary sample statistics. We need to begin by calculating the correlation coefficient, r.

  42. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Since it is possible to argue for either of these two variables affecting the other, let’s assign the number of tickets to be our explanatory variable (x), and thus the GPA as the response variable (y). Using a TI-83/84 Plus calculator, enter the values for the numbers of tickets (x) in L1 and the values for the GPAs (y) in L2. Then press and choose CALC and option 4:LinReg(ax+b). Press twice. We get r ≈ -0.586619 from the calculator and we know that n = 15.

  43. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Note that we rounded r to six decimal places, rather than three decimal places, to avoid additional rounding error in the following calculation of the test statistic. Substituting these values into the formula for the t-test statistic yields the following.

  44. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Step 4: Draw a conclusion and interpret the decision. We will use rejection regions in this example to draw the conclusion. Since the sample size for this example is 15, the number of degrees of freedom is n - 2 = 15 - 2 = 13. Using the t‑distribution table or appropriate technology, we find the critical value for this test, So we will reject the null hypothesis, H0, if  t  ≥ 2.160.

  45. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Since  t  ≈ 2.612 and 2.612 ≥ 2.160, the test statistic falls in the rejection region. Thus, we reject the null hypothesis. Therefore, there is sufficient evidence at the 0.05 level of significance to support the claim that there is a significant linear relationship between the number of parking tickets a student receives during a semester and his or her GPA during the same semester.

  46. Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant An online retailer wants to research the effectiveness of its mail-out catalogs. The company collects data from its eight largest markets with respect to the number of catalogs (in thousands) that were mailed out one fiscal year versus sales (in thousands of dollars) for that year. The results are as follows.

  47. Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Use a hypothesis test to determine if the linear relationship between the number of catalogs mailed out and sales is statistically significant at the 0.01 level of significance. Solution Step 1: State the null and alternative hypotheses. We wish to test the claim that a significant linear relationship exists between the number of catalogs mailed out and the corresponding sales for that area.

  48. Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Step 2: Determine which distribution to use for the test statistic and state the level of significance. We will use the t-test statistic with the given level of significance,  = 0.01.

  49. Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Step 3: Gather data and calculate the necessary sample statistics. We first need to calculate the correlation coefficient, r. It is possible to infer that mailing a larger number of catalogs to a region will influence the number of sales in that region. Thus, the explanatory variable (x) will be the number of catalogs and the response variable (y) will be the sales. Using a TI-83/84 Plus calculator, enter the values for the numbers of catalogs mailed (x) in L1 and the sales values (y) in L2.

  50. Example 12.8 And use the TI-84 LinRegTTest to complete the hypothesis test. You can then disregard most or all of the by-hand detail that follows in the next few slides. Disregard also their use of LinReg; the LinRegTTest is bigger and better.

More Related