1 / 52

Finding Areas with Calc 1. Shade Norm

Finding Areas with Calc 1. Shade Norm. Consider WISC data from before: N (100, 15). Suppose we want to find % of children whose scores are above 125 Specify window: X[55,145]12 and Y[-.008, .028].01 Press 2nd VARS (DISTR), then choose Draw and 1: ShadeNorm(.

semah
Download Presentation

Finding Areas with Calc 1. Shade Norm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Areas with Calc 1. Shade Norm • Consider WISC data from before: N (100, 15). Suppose we want to find % of children whose scores are above 125 • Specify window: X[55,145]12 and Y[-.008, .028].01 • Press 2nd VARS (DISTR), then choose Draw and 1: ShadeNorm(. • Compete command ShadeNorm(125, 1E99, 100, 15) or (0,85, 100,15). • If using Z scores instead of raw scores, the mean 0 and SD 1 will be understood so ShadeNorm (1,2) will give you area from z-score of 1.0 to 2.0. How does this compare to the 68-95-99.7 rule?

  2. 2. NormalCDF • Advantage- quicker, disadvantage- no picture. • 2nd Vars (DISTR) choose 2: normalcdf(. • Complete command (125, 1E99, 100, 15) and press enter. You get .0477 or approx. 5%. • If you have Z scores: normalcdf (-1, 1) = .6827 aka 68%...like our rule!

  3. InvNorm • InvNorm calculates the raw or Z score value corresponding to a known area under the curve. • 2nd Vars (DISTR), choose 3: invNorm(. • Complete the command invNorm (.9, 100, 15) and press Enter. You get 119.223, so this is the score corresponding to 90th percentile. • Compare this with command invNorm (.9) you get 1.28. This is the Z score.

  4. Describing Bivariate Relationships • Chapter 3 Summary • YMS3e • AP Stats at CSHNYC • Ms. Namad

  5. Bivariate Relationships • What is Bivariate data? • When exploring/describing a bivariate (x,y) relationship: • Determine the Explanatory and Response variables • Plot the data in a scatterplot • Note the Strength, Direction, and Form • Note the mean and standard deviation of x and the mean and standard deviation of y • Calculate and Interpret the Correlation, r • Calculate and Interpret the Least Squares Regression Line in context. • Assess the appropriateness of the LSRL by constructing a Residual Plot.

  6. 3.1 Response Vs. Explanatory Variables • Response variable measures an outcome of a study, explanatory variable helps explain or influences changes in a response variable (like independent vs. dependent). • Calling one variable explanatory and the other response doesn’t necessarily mean that changes in one CAUSE changes in the other. • Ex: Alcohol and Body temp: One effect of Alcohol is a drop in body temp. To test this, researches give several amounts of alcohol to mice and measure each mouse’s body temp change. What are the explanatory and response variables?

  7. Scatterplots • Scatterplot shows the relationship between two quantitative variables measured on the same individuals. • Explanatory variables along X axis, Response variables along Y. • Each individual in data appears as the point in the plot fixed by the values of both variables for that individual. • Example:

  8. Interpreting Scatterplots • Direction: in previous example, the overall pattern moves from upper left to lower right. We call this a negative association. • Form: The form is slightly curved and there are two distinct clusters. What explains the clusters? (ACT States) • Strength: The strength is determined by how closely the points follow a clear form. The example is only moderately strong. • Outliers: Do we see any deviations from the pattern? (Yes, West Virginia, where 20% of HS seniors take the SAT but the mean math score is only 511).

  9. Association

  10. Introducing Categorical Variables

  11. Calculator Scatterplot • Enter the Beer consumption in L1 and the BAC values in L2 • Next specify scatterplot in Statplot menu (first graph). X list L1 Y List L2 (explanatory and response) • Use ZoomStat. • Notice that their are no scales on the axes and they aren’t labeled. If you are copying your graph to your paper, make sure you scale and label the Axis (use Trace)

  12. Correlation • Caution- our eyes can be fooled! Our eyes are not good judges of how strong a linear relationship is. The 2 scatterplots depict the same data but drawn with a different scale. Because of this we need a numerical measure to supplement the graph.

  13. r • The Correlation measures the direction and strength of the linear relationship between 2 variables. • Formula- (don’t need to memorize or use): r = • In Calc: Go to Catalog (2nd, zero button), go to DiagnosticOn, enter, enter. You only have to do this ONCE! Once this is done: • Enter data in L1 and L2 (you can do calc-2 var stats if you want the mean and sd of each) • Calc, LinReg (A + Bx) enter

  14. Interpreting r • The absolute value of r tells you the strength of the association (0 means no association, 1 is a strong association) • The sign tells you whether it’s a positive or a negative association. So r ranges from -1 to +1 • Note- it makes no difference which variable you call x and which you call y when calculating correlation, but stay consistent! • Because r uses standardized values of the observations, r does not change when we change the units of measurement of x, y, or both. (Ex: Measuring height in inches vs. ft. won’t change correlation with weight) • values of -1 and +1 occur ONLY in the case of a perfect linear relationship , when the variables lie exactly along a straight line.

  15. Examples 1. Correlation requires that both variables be quantitative 2. Correlation measures the strength of only LINEAR relationships, not curved...no matter how strong they are! 3. Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. Use r with caution when outliers appear in the scatterplot 4. Correlation is not a complete summary of two-variable data, even when the relationship is linear- always give the means and standard deviations of both x and y along with the correlation.

  16. 3.2- least squares regression Text The slope here B = .00344 tells us that fat gained goes down by .00344 kg for each added calorie of NEA according to this linear model. Our regression equation is the predicted RATE OF CHANGE in the response y as the explanatory variable x changes. The Y intercept a = 3.505kg is the fat gain estimated by this model if NEA does not change when a person overeats.

  17. Prediction • We can use a regression line to predict the response y for a specific value of the explanatory variable x.

  18. LSRL • In most cases, no line will pass exactly through all the points in a scatter plot and different people will draw different regression lines by eye. • Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatter plot • A good regression line makes the vertical distances of the points from the line as small as possible • Error: Observed response - predicted response

  19. LSRL Cont.

  20. Equation of LSRL • Example 3.36: The Sanchez household is about to install solar panels to reduce the cost of heating their house. In order to know how much the panels help, they record their consumption of natural gas before the panels are installed. Gas consumption is higher in cold weather, so the relationship between outside temp and gas consumption is important.

  21. Describe the direction, form, and strength of the relationship • Positive, linear, and very strong • About how much gas does the regression line predict that the family will use in a month that averages 20 degree-days per day? • 500 cubic feet per day • How well does the least-squares line fit the data?

  22. Residuals • The error of our predictions, or vertical distance from predicted Y to observed Y, are called residuals because they are “left-over” variation in the response. One subject’s NEA rose by 135 calories. That subject gained 2.7 KG of fat. The predicted gain for 135 calories is Y hat = 3.505- .00344(135) = 3.04 kg The residual for this subject is y - yhat = 2.7 - 3.04 = -.34 kg

  23. Residual Plot • The sum of the least-squares residuals is always zero. • The mean of the residuals is always zero, the horizontal line at zero in the figure helps orient us. This “residual = 0” line corresponds to the regression line

  24. Residuals List on Calc • If you want to get all your residuals listed in L3 highlight L3 (the name of the list, on the top) and go to 2nd- stat- RESID then hit enter and enter and the list that pops out is your resid for each individual in the corresponding L1 and L2.  (if you were to create a normal scatter plot using this list as your y list, so x list: L1 and Y list L3 you would get the exact same thing as if you did a residual plot defining x list as L1 and Y list as RESID as we had been doing).   This is a helpful list to have to check your work when asked to calculate an individuals residual.  

  25. Examining Residual Plot • Residual plot should show no obvious pattern. A curved pattern shows that the relationship is not linear and a straight line may not be the best model. • Residuals should be relatively small in size. A regression line in a model that fits the data well should come close” to most of the points. • A commonly used measure of this is the standard deviation of the residuals, given by: For the NEA and fat gain data, S =

  26. Residual Plot on Calc • Produce Scatterplot and Regression line from data (lets use BAC if still in there) • Turn all plots off • Create new scatterplot with X list as your explanatory variable and Y list as residuals (2nd stat, resid) • Zoom Sta

  27. R squared- Coefficient of determination If all the points fall directly on the least-squares line, r squared = 1. Then all the variation in y is explained by the linear relationship with x. So, if r squared = .606, that means that 61% of the variation in y among individual subjects is due to the influence of the other variable. The other 39% is “not explained”. r squared is a measure of how successful the regression was in explaining the response

  28. Facts about Least-Squares regression • The distinction between explanatory and response variables is essential in regression. If we reverse the roles, we get a different least-squares regression line. • There is a close connection between corelation and the slope of the LSRL. Slope is r times Sy/Sx. This says that a change of one standard deviation in x corresponds to a change of 4 standard deviations in y. When the variables are perfectly correlated (4 = +/- 1), the change in the predicted response y hat is the same (in standard deviation units) as the change in x. • The LSRL will always pass through the point (X bar, Y Bar) • r squared is the fraction of variation in values of y explained by the x variable

  29. 3.3 Influences • Correlation r is not resistant. Extrapolation is not very reliable. One unusual point in the scatterplot greatly affects the value of r. LSRL also not resistant. • A point extreme in the x direction with no other points near it pulls the line toward itself. This point is influential.

  30. Lurking Variables- Beware! • Example: A college board study of HS grads found a strong correlation between math minority students took in high school and their later success in college. News articles quoted the College Board saying that “math is the gatekeeper for success in college”. • But, Minority students from middle-class homes with educated parents no doubt take more high school math courses. They are also more likely to have a stable family, parents who emphasize education, and can pay for college etc. These students would likely succeed in college even if they took fewer math courses. The family background of students is a lurking variable that probably explains much of the relationship between math courses and college success.

  31. Beware correlations based on averages • Correlations based on averages are usually too high when applied to individuals. • Example: if we plot the average height of young children against their age in months, we will see a very strong positive association with correlation near 1. But individual children of the same age vary a great deal in height. A plot of height against age for individual children will show much more scatter and lower correlation than the plot of average height against age.

  32. Chapter Example:Corrosion and Strength • Consider the following data from the article, “The Carbonation of Concrete Structures in the Tropical Environment of Singapore” (Magazine of Concrete Research (1996):293-300 which discusses how the corrosion of steel(caused by carbonation) is the biggest problem affecting concrete strength: • x= carbonation depth in concrete (mm) • y= strength of concrete (Mpa) • Define the Explanatory and Response Variables. • Plot the data and describe the relationship.

  33. Strength (Mpa) Depth (mm) Corrosion and Strength There is a strong, negative, linear relationship between depth of corrosion and concrete strength. As the depth increases, the strength decreases at a constant rate.

  34. Strength (Mpa) Depth (mm) Corrosion and Strength The mean depth of corrosion is 35.89mm with a standard deviation of 18.53mm. The mean strength is 14.58 Mpa with a standard deviation of 5.29 Mpa.

  35. Strength (Mpa) Depth (mm) Corrosion and Strength Find the equation of the Least Squares Regression Line (LSRL) that models the relationship between corrosion and strength. y=24.52+(-0.28)x strength=24.52+(-0.28)depth r=-0.96

  36. Strength (Mpa) Depth (mm) Corrosion and Strength y=24.52+(-0.28)x strength=24.52+(-0.28)depth r=-0.96 What does “r” tell us? There is a Strong, Negative, LINEAR relationship between depth of corrosion and strength of concrete. What does “b=-0.28” tell us? For every increase of 1mm in depth of corrosion, we predict a 0.28 Mpa decrease in strength of the concrete.

  37. Corrosion and Strength • Use the prediction model (LSRL) to determine the following: • What is the predicted strength of concrete with a corrosion depth of 25mm? • strength=24.52+(-0.28)depth • strength=24.52+(-0.28)(25) • strength=17.59 Mpa • What is the predicted strength of concrete with a corrosion depth of 40mm? • strength=24.52+(-0.28)(40) • strength=13.44 Mpa • How does this prediction compare with the observed strength at a corrosion depth of 40mm?

  38. Residuals • Note, the predicted strength when corrosion=40mm is: • predicted strength=13.44 Mpa • The observed strength when corrosion=40mm is: • observed strength=12.4mm • The prediction did not match the observation. • That is, there was an “error” or “residual” between our prediction and the actual observation. • RESIDUAL = Observed y - Predicted y • The residual when corrosion=40mm is: • residual = 12.4 - 13.44 • residual = -1.04

  39. Assessing the Model • Is the LSRL the most appropriate prediction model for strength? r suggests it will provide strong predictions...can we do better? • To determine this, we need to study the residuals generated by the LSRL. • Make a residual plot. • Look for a pattern. • If no pattern exists, the LSRL may be our best bet for predictions. • If a pattern exists, a better prediction model may exist...

  40. residuals depth(mm) Residual Plot • Construct a Residual Plot for the (depth,strength) LSRL. • There appears to be no pattern to the residual plot...therefore, the LSRL may be our best prediction model.

  41. Strength (Mpa) Depth (mm) Coefficient of Determination We know what “r” tells us about the relationship between depth and strength....what about r2? 93.75% of the variability in predicted strength can be explained by the LSRL on depth.

  42. Summary • When exploring a bivariate relationship: • Make and interpret a scatterplot: • Strength, Direction, Form • Describe x and y: • Mean and Standard Deviation in Context • Find the Least Squares Regression Line. • Write in context. • Construct and Interpret a Residual Plot. • Interpret r and r2 in context. • Use the LSRL to make predictions...

  43. Examining Relationships Regression Review

  44. Regression Basics • When describing a Bivariate Relationship: • Make a Scatterplot • Strength, Direction, Form • Model: y-hat=a+bx • Interpret slope in context • Make Predictions • Residual = Observed-Predicted • Assess the Model • Interpret “r” • Residual Plot

  45. Reading Minitab Output Regression equations aren’t always as easy to spot as they are on your TI-84. Can you find the slope and intercept above?

  46. Influential? Outliers/Influential Points Does the age of a child’s first word predict his/her mental ability? Consider the following data on (age of first word, Gesell Adaptive Score) for 21 children. Does the highlighted point markedly affect the equation of the LSRL? If so, it is “influential”. Test by removing the point and finding the new LSRL.

  47. Explanatory vs. Response • The Distinction Between Explanatory and Response variables is essential in regression. • Switching the distinction results in a different least-squares regression line. • Note: The correlation value, r, does NOT depend on the distinction between Explanatory and Response.

  48. Correlation • The correlation, r, describes the strength of the straight-line relationship between x and y. • Ex: There is a strong, positive, LINEAR relationship between # of beers and BAC. • There is a weak, positive, linear relationship between x and y. However, there is a strong nonlinear relationship. • r measures the strength of linearity...

  49. Coefficient of Determination • The coefficient of determination, r2, describes the percent of variability in y that is explained by the linear regression on x. • 71% of the variability in death rates due to heart disease can be explained by the LSRL on alcohol consumption. • That is, alcohol consumption provides us with a fairly good prediction of death rate due to heart disease, but other factors contribute to this rate, so our prediction will be off somewhat.

  50. Cautions • Correlation and Regression are NOT RESISTANT to outliers and Influential Points! • Correlations based on “averaged data” tend to be higher than correlations based on all raw data. • Extrapolating beyond the observed data can result in predictions that are unreliable.

More Related