1 / 40

Relationships Between Measurement Variables

Relationships Between Measurement Variables. Statistics lecture 4. Thought Question 1. There is a positive correlation between SAT score and GPA. For used cars, there is a negative correlation between age of the car and selling price. What does that mean?. Thought Question 2.

takara
Download Presentation

Relationships Between Measurement Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relationships BetweenMeasurement Variables Statistics lecture 4

  2. Thought Question 1 There is a positive correlation between SAT score and GPA. For used cars, there is a negative correlation between age of the car and selling price. What does that mean?

  3. Thought Question 2 If you had a scatter plot comparing the heights of a number of fathers and their adult sons, how could you use it to predict the adult height of a child?

  4. Thought Question 3 Would these pairs of variables have a positive correlation, a negative correlation, or no correlation? • Calories eaten per day and weight • Calories eaten per day and IQ • Vinho consumed and driving ability • Number of priests and amount of liquor sold in Portugal cities. • Height of husbands and heights of wives

  5. Goals for this lecture • Get the idea of a statistical relationship and statistical significance • Understand the meaning of correlation between two measurement variables • Learn how to use the linear relationship between two variables to predict one value, given the other

  6. Relationships • Deterministic: You can predict one variable exactly given another (example: distance at a constant speed given time) • Statistical: You can describe a relationship between variables, but it isn’t precise because of natural variability (example: the average relationship between height and weight.)

  7. Remember How to Build a Scatter Plot? Doig

  8. Relationship betweenHeight and Weight

  9. Statistical Significance Often we must use a sample to tell us about a population. We want to know if any relationships observed in the sample are “real” and not just chance.

  10. Rule of Thumb A statistical relationship is considered significant if it is stronger than 95% of the relationships we’d expect to see by chance.

  11. Be aware of sample size Statistical significance is affected by sample size: • It’s easy to rule out chance if you have lots of observations (but the relationship still may not be strong or useful.) • On the other hand, even a strong relationship may not achieve statistical significance if the sample is small.

  12. Relationship betweenHeight and Weight

  13. Relationship betweenHeight and Weight

  14. Strength of Relationship? Correlation (also called the correlation coefficient or Pearson’s r) is the measure of strength of the linear relationship between two variables. Think of strength as how closely the data points come to falling on a line drawn through the data.

  15. Features of Correlation • Correlation can range from +1 to -1 • Positive correlation: As one variable increases, the other increases • Negative correlation: As one variable increases, the other decreases • Zero correlation means the best line through the data is horizontal • Correlation isn’t affected by the units of measurement

  16. r = +.4 r = +1 r = +.8 Positive Correlations r = +.1

  17. Negative Correlations r = -.4 r = -.1 r = -.8 r = -1

  18. Zero correlation r = 0 r = 0

  19. Zero correlation

  20. Number of PointsDoesn’t Matter r = .8 r = .8

  21. Important! Correlation does not imply causation.

  22. Linear Regression In addition to figuring the strength of the relationship, we can create a simple equation that describes the best-fit line (also called the “least-squares” line) through the data. This equation will help us predict one variable, given the other.

  23. Best-fit (“least-squares”) Line

  24. Best-fit Line??? (much variance)

  25. Best-fit Line? (less variance)

  26. Best-fit Line! (least variance)

  27. Remember 9th Grade Algebra? x = horizontal axis y = vertical axis Equation for a line: y = slope*x + intercept or as it often is stated: y = mx + b

  28. Don’t panic! You won’t have to calculate the least-squares line equation yourself. Instead, you can use functions built into common computer programs like Microsoft Excel or even many pocket calculators. (But you do need to know how to use the regression line equation.)

  29. Excel Regression Outputof Height vs. Weight

  30. Plotting the regression line

  31. Using the Regression Equationto Predict Y for a Given X • b: intercept = -123 • m: coefficient of height (x) = 4 y = mx + b weight = (4 * height) + -123 “Predicted” weight for 68 inches: weight = (4 * 68) - 123 = 149 pounds

  32. Predict Weight for a Given Height weight = (4 * height) - 123 • 62 inches (4 * height) - 123= 125 lbs. • 75 inches (4 * height) - 123= 177 Lbs. • 70 inches (4 * height) - 123= 157 lbs.

  33. What’s the point? • Regression shows what a dependent (y) variable is “predicted” to be, given a value for the independent (x)variable. • Definition: The residual is the amount an actual dependent (y) value differs from the “predicted” value • Definition: R-squared is the percentage of variance from the mean that is explained by the independent (x) variable

  34. Excel Regression Outputof Height vs. Weight

  35. Demo

  36. Regression in CAR • School test scores • Cheating in school test scores • Tenure of white vs. black coaches in NBA • Racial profiling in traffic stops • Miami criminal justice

  37. Extrapolation? Beware! Don’t use your regression equation very far outside the boundaries of your data because the relationship may not hold. • Words vs. age (r = .993 for ages 2-6) Words = 562 * Age - 764 Age 1: 562 * 1 -764 = -202 words???

  38. Negative Weight? Data area

  39. Mark Twain and the length of the Mississippi River • From “Life on the Mississippi” (1884) • In 176 years, the river was shortened by 403 kilometers, or about 2.3 kilometers per year • A million years ago, the Mississippi must have been 2.2 million kilometers long • In 742 years, it will be 2.9 kilometers long, joining Cairo, Illinois, and New Orleans • Twain: “There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.”

  40. Perguntas?

More Related