1 / 80

Bivariate data Correlation Coefficient of Determination Regression One-way Analysis of Variance (ANOVA)

Lecture #9. Bivariate data Correlation Coefficient of Determination Regression One-way Analysis of Variance (ANOVA). Bivariate Data. Bivariate data are just what they sound like – data with measurements on two variables; let’s call them X and Y

osgood
Download Presentation

Bivariate data Correlation Coefficient of Determination Regression One-way Analysis of Variance (ANOVA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture #9 Bivariate data Correlation Coefficient of Determination Regression One-way Analysis of Variance (ANOVA)

  2. Bivariate Data • Bivariate data are just what they sound like – data with measurements on two variables; let’s call them X and Y • Here, we will look at two continuous variables • Want to explore the relationship between the two variables • Example: Fasting blood glucose and ventricular shortening velocity

  3. Scatterplot • We can graphically summarize a bivariate data set with a scatterplot (also sometimes called a scatter diagram) • Plots values of one variable on the horizontal axis and values of the other on the vertical axis • Can be used to see how values of 2 variables tend to move with each other (i.e. how the variables are associated)

  4. Scatterplot: positive correlation

  5. Scatterplot: negative correlation

  6. Scatterplot: real data example

  7. Numerical Summary • Typically, a bivariate data set is summarized numerically with 5 summary statistics • These provide a fair summary for scatterplots with the same general shape as we just saw, like an oval or an ellipse • We can summarize each variable separately : X mean, X SD; Y mean, Y SD • But these numbers don’t tell us how the values of X and Y vary together

  8. Pearson’s Correlation Coefficient “r” • “r” indicates… • strength of relationship (strong, weak, or none) • direction of relationship • positive (direct) – variables move in same direction • negative (inverse) – variables move in opposite directions • r ranges in value from –1.0 to +1.0 -1.0 0.0 +1.0 Strong Negative No Rel. Strong Positive

  9. Correlation (cont) Correlation is the relationship between two variables.

  10. What r is... • r is a measure of LINEAR ASSOCIATION • The closer r is to –1 or 1, the more tightly the points on the scatterplot are clustered around a line • The sign of r (+ or -) is the same as the sign of the slope of the line • When r = 0, the points are not LINEARLY ASSOCIATED– this does NOT mean there is NO ASSOCIATION

  11. ...and what r is not • r is a measure of LINEAR ASSOCIATION • r does NOT tell us if Y is a function of X • r does NOT tell us if XcausesY • r does NOT tell us if YcausesX • r does NOT tell us what the scatterplot looks like

  12. r  0: curved relation

  13. r  0: outliers outliers

  14. r  0: parallel lines

  15. r  0: different linear trends

  16. r  0: random scatter

  17. Correlation is NOT causation • You cannot infer that since X and Y are highly correlated (r close to –1 or 1) that X is causing a change in Y • Y could be causing X • X and Y could both be varying along with a third, possibly unknown factor (either causal or not)

  18. Correlation matrix

  19. Reading Correlation Matrix r = -.904 p = .013 -- Probability of getting a correlation this size by sheer chance. Reject Ho if p ≤ .05. sample size r (4) = -.904, p.05

  20. Interpretation of Correlation Correlations • from 0 to 0.25 (-0.25) = little or no relationship; • from 0.25 to 0.50 (-0.25 to 0.50) = fair degree of relationship; • from 0.50 to 0.75 (-0.50 to -0.75) = moderate to good relationship; • greater than 0.75 (or -0.75) = very good to excellent relationship.

  21. Limitations of Correlation • linearity: • can’t describe non-linear relationships • e.g., relation between anxiety & performance • truncation of range: • underestimate stength of relationship if you can’t see full range of x value • no proof of causation • third variable problem: • could be 3rd variable causing change in both variables • directionality: can’t be sure which way causality “flows”

  22. Coefficient of Determination r2 • The square of the correlation,r2, is the proportion of variation in the values of y that is explained by the regression model with x. • Amount of variance accounted for in y by x • Percentage increase in accuracy you gain by using the regression line to make predictions • 0  r2 1. • The larger r2 , the stronger the linear relationship. • The closer r2 is to 1, the more confident we are in our prediction.

  23. Age vs. Height: r2=0.9888.

  24. Age vs. Height: r2=0.849.

  25. Linear Regression • Correlation measures the direction and strength of the linear relationship between two quantitative variables • A regression line • summarizes the relationship between two variables if the form of the relationship is linear. • describes how a response variable y changes as an explanatory variable x changes. • is often used as a mathematical model to predict the value of a response variable y based on a value of an explanatory variable x.

  26. (Simple) Linear Regression • Refers to drawing a (particular, special) line through a scatterplot • Used for 2 broad purposes: • Estimation • Prediction

  27. Formula for Linear Regression Slope or the change in y for every unit change in x Y-intercept or the value of y when x = 0. y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis.

  28. Interpretation of parameters • The regression slope is the average change in Y when X increases by 1 unit • The intercept is the predicted value for Y when X = 0 • If the slope = 0, then X does not help in predicting Y (linearly)

  29. Which line? • There are many possible lines that could be drawn through the cloud of points in the scatterplot:

  30. Least Squares • Q: Where does this equation come from? A: It is the line that is ‘best’ in the sense that it minimizes the sum of the squared errors in the vertical (Y) direction Y * * * errors * * X

  31. U.K. monthly return is y variable Linear Regression U.S. monthly return is x variable Question: What is the relationship between U.K. and U.S. stock returns?

  32. Correlation tells the strength of relationship between x and y. Relationship may not be linear.

  33. Linear Regression A regression creates a model of the relationship between x and y. It fits a line to the scatter plot by minimizing the distance between y and the line or If the correlation is significant then create a regression analysis.

  34. Linear Regression The slope is calculated as: Tells you the change in the dependent variable for every unit change in the independent variable.

  35. The coefficient of determination or R-square measures the variation explained by the best-fit line as a percent of the total variation:

  36. y’=47 y’=20 if x=18 then… if x=24 then… Regression Graphic – Regression Line

  37. Regression Equation • y’= bx + a • y’ = predicted value of y • b = slope of the line • x = value of x that you plug-in • a = y-intercept (where line crosses y access) • In this case…. • y’ = -4.263(x) + 125.401 • So if the distance is 20 feet • y’ = -4.263(20) + 125.401 • y’ = -85.26 + 125.401 • y’ = 40.141

  38. SPSS Regression Set-up • “Criterion,” • y-axis variable, • what you’re trying to predict • “Predictor,” • x-axis variable, • what you’re basing the prediction on

  39. b Getting Regression Info from SPSS y’ = b (x) + a y’ = -4.263(20) + 125.401 a

  40. Extrapolation • Interpolation: Using a model to estimate Y for an X value within the range on which the model was based. • Extrapolation: Estimating based on an X value outside the range. • Interpolation Good, Extrapolation Bad.

  41. Nixon’s Graph:Economic Growth

  42. Nixon’s Graph:Economic Growth Start of Nixon Adm.

  43. Nixon’s Graph:Economic Growth Start of Nixon Adm. Now

  44. Nixon’s Graph:Economic Growth Start of Nixon Adm. Projection Now

  45. Conditions for regression • “Straight enough” condition (linearity) • Errors are mostly independent of X • Errors are mostly independent of anything else you can think of • Errors are more-or-less normally distributed

  46. General ANOVA SettingComparisons of 2 or more means • Investigator controls one or more independent variables • Called factors (or treatment variables) • Each factor contains two or more levels (or groups or categories/classifications) • Observe effects on the dependent variable • Response to levels of independent variable • Experimental design: the plan used to collect the data

  47. Logic of ANOVA • Each observation is different from the Grand (total sample) Mean by some amount • There are two sources of variance from the mean: • 1) That due to the treatment or independent variable • 2) That which is unexplained by our treatment

  48. One-Way Analysis of Variance • Evaluate the difference among the means of two or more groups Examples: Accident rates for 1st, 2nd, and 3rd shift Expected mileage for five brands of tires • Assumptions • Populations are normally distributed • Populations have equal variances • Samples are randomly and independently drawn

More Related