1 / 82

Modules 11 to 13, REGRESSION ANALYSIS…

Modules 11 to 13, REGRESSION ANALYSIS…. Examining Relationships: Quantitative Data OLI, Concepts in Statistics. Before we formally learn about analysis of bivariate data... What do you see?. ... What do you see?. ... What do you see?.

sandra_john
Download Presentation

Modules 11 to 13, REGRESSION ANALYSIS…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modules 11 to 13, REGRESSION ANALYSIS… • Examining Relationships: Quantitative Data • OLI, Concepts in Statistics

  2. Before we formally learn about analysis of bivariate data... What do you see?

  3. ... What do you see?

  4. ... What do you see?

  5. Remember...It starts with a topic, followed by a question... • Dr. Gould, UCLA

  6. consider two variables & come up with a question relating those two variables... • Dr. Gould, UCLA

  7. Do you believe there Is a relationship between... Time spent studying and GPA? # of cigarettes smoked daily & life expectancy Salary and education level? Age and height? How could we find out? The data cycle! 

  8. Relationships • When we consider data that comes in pairs or two’s or has two variables, the data is referred to as bivariate data. Much of the bivariate data we will examine is numeric. • There may or may not exist a relationship/an association between the 2 variables. • Does one variable influence the other? Or vice versa? Or do the two variables just ‘go together’ by chance? Or is the relationship influenced by anothervariable(s) that we are unaware of? • Does one variable ‘cause’ the other? Caution!

  9. Bivariate Data • Proceed similarly as univariate distributions … • (review... What is univariate data? Which graphical models do we typically use with univariate numerical data?)

  10. Bivariate Data • Like we were saying... proceed similarly as univariate distributions • With bivariate data, we still graph (use visual model(s) to describe data; scatter plot; Least Squares Regression Line (LSRL) • With bivariate data, we still look at overall patterns and deviations from those patterns (DOFS: Direction, Outlier(s), Form, Strength). Review: How did we look for patterns in univariate numeric data? What did we use? • With bivariate data, we still analyze numerical summary/descriptive statistics (what is this?)

  11. Bivariate Distributions • Explanatory variable, x, ‘factor,’ may help predict or explain changes in response variable; explanatory variable is usually on horizontal axis • Response variable, y, measures an outcome of a study, usually on vertical axis

  12. Bivariate Data Distributions • For example ... Alcohol (explanatory) and body temperature (response). Generally, the more alcohol consumed, the higher the body temperature. Still use caution with ‘cause.’ • Sometimes we don’t have variables that are clearly explanatory and response. • Sometimes there could be two ‘explanatory’ variables, such as ACT scores and SAT scores, or activity level and physical fitness. • Discuss with a partner for 1 minute; come up with a situation where we have two variables that are related, but neither are clearly explanatory nor response.

  13. Graphical models… • Many graphing models display uni-variate numeric data exclusively (review). • Main graphical representations used to display bivariate data (two quantitative variables) is scatterplot and least squares regression line (LSRL).

  14. Scatterplots • * Scatterplots show relationship between two quantitative variables measured on the same individuals or objects. • * Each individual/object in data appears as a point (x, y) on the scatterplot. • * Plot explanatory variable (if there is one) on horizontal axis. If no distinction between explanatory and response, either can be plotted on horizontal axis. • * Label both axes. Scale both axes with uniform intervals (but scales don’t have to match); and doesn’t have to start with zero; not considered misleading with scatterplots.

  15. variables: Clearly Explanatory and Response? Practice: Trends?

  16. Creating & Interpreting Scatterplots • Let’s collect some data: your age in years and the number of states you have visited in your lifetime. • Input into Stat Crunch & create scatter plot; which is our explanatory and which is our response variable? • Let’s do some predicting... to the best of our ability...

  17. Interpreting Scatterplots Look for overall patterns (DOFS) including: • direction: up or down, + or – association? • outliers/deviations: individual value(s) falls outside overall pattern; no outlier rule for bi-variatedata –unlike uni-variate data • form: linear? curved? clusters? gaps? • strength: how closely do the points follow a clear form? Strong, weak, moderate?

  18. Measuring Linear Association • Scatterplots (bi-variate data) show direction, outliers/ deviation(s), form, strength of relationship between two quantitative variables • Linear relationships are important; common, simple pattern; linear relationships are our focus in this course • Linear relationship is strong if points are close to a straight line; weak if scattered about • Other relationships (quadratic, logarithmic, etc.)

  19. Linear relationships

  20. Non-linear relationships

  21. Let’s go back to previous scatterplots... • With a partner, look at one of the previous scatterplots (your choice) and analyze through DOFS (direction, outlier(s), form, strength) • Three minutes... Then report out in groups that choose the same scatterplots) • Be ready to make predictions based on the scatterplot

  22. Creating & Interpreting Scatterplots • Go to my website, download the COC Math 140 Survey Data Fall 2015 OR Spring 2016. Copy & paste columns (‘Height’ And ‘Weight’) • Is data messy? Does it need to be ‘fixed?’ ... Hint, scan for ordered pairs (this is bivariate data); each and every point must be an ordered pair. • Graph it; do we need to evaluate any points (any possible inaccuracies?)

  23. Creating & Interpreting Scatterplots • ‘Height’ & ‘Weight’ • Create a scatter plot of the data. Analyze (DOFS) • Let’s do some predictions... • It is difficult to do predictions sometimes? We will get back to this with a ‘better’ model...

  24. How strong are these relationships? Which one is stronger?

  25. Measuring Linear Association: Correlation or “r” • Sometimes our eyes are not a good judge • Need to specify just how strong or weak a linear relationship is with bivariate data • Need a numeric measure • Correlation or ‘r’

  26. Measuring Linear Association: Correlation or “r” • * Correlation (r) is a numeric measure of direction and strength of a linear relationship between two quantitative variables • Correlation (r) is always between -1 and 1 • Correlation (r) is not resistant (look at formula; based on mean) • r doesn’t tell us about individual data points, but rather trends in the data • * Never calculate by formula; use Stat Crunch (dependent on having raw data)

  27. Calculating Correlation “r” • n, x1, x2, etc., , y1, y2, etc., , sx, sy, …

  28. Measuring Linear Association: Correlation or “r” • r ≈0  not strong linear relationship • r close to 1  strong positive linear relationship • r close to -1  strong negative linear relationship • Go back to our height/weight data & calculate ‘r,’ correlation • PRACTICE: Go to my website, data sets, Cereal Data from Lock 5, and copy/paste Calories and Fat columns into Stat Crunch; create scatterplot; calculate ‘r’; make some observations, some predictions

  29. Correlation; ‘r’

  30. Guess the correlationwww.rossmanchance.com/applets (also stat crunch) • ‘March Madness’ bracket-style Guess the Correlation tournament • Playing cards; match up head-to-head competition/rounds • Look at a scatterplot, make your guess • Student who is closest survives until the next round

  31. Correlation & regression applet partner activity • Go to www.whfreeman.com/tps5e • Go to applets • Go to Correlation & Regression • Now download (from my website, under ‘articles, assignments, and activities’) Correlation Partner Activity & follow the directions. • Partner up with someone you have not partnered with yet; this should take no more than 15-20 minutes, including the write-up; print out & turn in with both your names on it.

  32. Caution… interpreting correlation • Note: be careful when addressing form in scatterplots • Strong positive linear relationship ► correlation ≈ 1 • But • Correlation ≈ 1 does not necessarily mean relationship is linear; always plot data!

  33. R ≈ 0.816 for each of these

  34. Facts about Correlation • Correlation doesn’t care which variables is considered explanatory and which is considered response; can switch x & y; still same correlation (r) value • Try with height & weight Math 140 data; try with cereal calories and fat data • CAUTION! Switching x & y WILL change your scatterplot; try with our data sets!… just won’t change ‘r’

  35. Facts about Correlation • r is in standard units, so r doesn’t change if units are changed • If we change from yards to feet, or years to months, or gallons to liters ... r is not effected • + r, positive association • - r, negative association

  36. Facts about Correlation • Correlation is always between -1 & 1 • Makes no sense for r = 13 or r = -5 • r = 0 means very weak linear relationship • r = 1 or -1 means strong linear association

  37. Facts about Correlation • Both variables must be quantitative, numerical. Doesn’t make any sense to discuss r for qualitative or categorical data • Correlation is not resistant (like mean and SD). Be careful using r when outliers are present (think of the formula, think of our partner activity)

  38. Facts about Correlation • r isn’t enough! … if we just consider r, it could be misleading; we must also consider the distribution’s mean, standard deviation, graphical representation, etc. • Correlation does not imply causation; i.e., # ice cream sales in a given week and # of pool accidents

  39. Absurd examples… correlation does not imply causation… • Did you know that eating chocolate makes winning a Nobel Prize more likely? The correlation between per capita chocolate consumption and the number of Nobel laureates per 10 million people for 23 selected countries is r = 0.791 • Did you know that statistics is causing global warming? As the number of statistics courses offered has grown over the years, so has the average global temperature!

  40. Least Squares Regression • Last section… scatterplots of two quantitative variables • r measures strength and direction of linear relationship of scatterplot

  41. What would we expect the sodium level to be in a hot dog that has 170 calories?

  42. Least Squares Regression • BETTER model to summarize overall pattern by drawing a line on scatterplot • Not any line; we want a best-fit line over scatterplot • Least Squares Regression Line (LSRL) or Regression Line

  43. Least-Squares Regression Line

  44. Let’s do some predicting by using the LSRL... • About how much would a home cost if it were: • 2,000 square feet? • 2,600 square feet? • 1,600 square feet?

  45. Let’s do some predicting by using the LSRL... • About how large would a home be if it were worth: • $450,000? • $350,000? • $220,000? • Also, let’s discuss where the x and y axes start...

  46. Least Squares Regression equation to predict values LSRL Model: is predicted value of response variable a is y-intercept of LSRL b is slope of LSRL; slope is predicted (expected) rate of change x is explanatory variable

  47. Least Squares Regressionequation • Typical to be asked to interpret slope & y-intercept of the equation of the LSRL, in context • Caution: Interpret the slope of the equation of LSRL as the predicted or average change or expected change in the response variable given a unit change in the explanatory variable • NOT change in y for a unit change in x; LSRL is a model; models are not perfect

  48. Interpret slope & y-intercept... • Notice the embedded • context in the equation • of the LSRL

  49. LSRL: Our Data • Go back to our data (age & # states visited; height and weight data from Math 140; calories & fat cereal data). • Create scatter plot; then put LSRL on our scatter plot; also determine the equation of the LSRL • Stat Crunch: stat, regression, simple linear, x variable, y variable, graphs, fitted line plot

More Related