200 likes | 369 Views
Exploring relationships between variables. Unit 2 Scatterplots , Associations, and Correlations. Scatterplots. Shows change over time Shows patterns Shows Trends Relationships Outlier values. Scatterplots . Can be positive or negative Show relationship amongst 2 variables
E N D
Exploring relationships between variables Unit 2 Scatterplots, Associations, and Correlations
Scatterplots • Shows change over time • Shows patterns • Shows Trends • Relationships • Outlier values
Scatterplots • Can be positive or negative • Show relationship amongst 2 variables • Can be shown more in depth through the Z-scores of both variables (ZX, ZY)
Z-scores • X-MeanX / Standard Deviation (SX) • Y-MeanY / Standard Deviation (SY) • Calculating standard deviation in the same way as before.
Ratio • Correlation coefficient • Sum of SX * SY / n-1 • Correlation measures the strength of the linear association between 2 variables
variables • Explanatory Variable – X • Response Variable - Y
Least-Squares Line • Y= a + bx • a = y intercept • b = slope • a = y – bx • b = SSxy/SSx • SSx = Sum of squares of x
SSx • This is calculated by obtaining the sum of each squared x • You then subtract the sum of x squared divided by n • You can get SSx on the calculator by squaring the standard deviation then multiplying it by (n-1)
SSxy • Sum of squares of x and y • Take the sum of each x value times each y value. • You then subtract from that total the (Sum of x) * (Sum of y) n
SSxy • SSxy is a more efficient way of computing • Sum of each (x-xbar) * (y-ybar)
Standard Error of Estimate • Se = square root of E(y-yp)squared/n – 2 • How to calculate square root of SDY – b(SDx * SDy) / n-2
Residuals • You can graph the residual of the equation to see if the regression is accurate • Residuals are the difference between the observed value and the predicted value • R = observed - predicted
Confidence Intervals • Yp – E < y <yp + E • Yp = predicted value of y
Types of data • Outlier • Leverage • Influential Point • Lurking Variable
Outlier • Any data point that stands away from the others
Leverage • Data points with X-values that are far from the mean • Can alter the line of least regression
Influential Point • Omitting this point can drastically alter the regression model
Lurking Variable • A variable that is hidden in the equation • It is not explicitly part of the model but affects the way the variables in the model appear