200 likes | 287 Views
Correlation and Regression. Statistics 2126. Introduction. Means etc are of course useful We might also wonder, “how do variables go together?” IQ is a great example It goes together with so much stuff. A scatterplot.
E N D
Correlation and Regression Statistics 2126
Introduction • Means etc are of course useful • We might also wonder, “how do variables go together?” • IQ is a great example • It goes together with so much stuff
A scatterplot • You tend to put the predictor on the x axis and the predicted on the y, though this is not a hard and fast rule • A scatterplot is a pretty good EDA tool too eh • Pick an appropriate scale for you axes • Plot the (x,y) pairs
So what does it mean • If, as one variable increases, the other variable increases we have a positive association • If, as one goes up, the other goes down, we have a negative association • There could be no association at all
Linear relationships • BTW, I am only talking about straight line relationships • Not curvilinear • Say like the Yerkes Dotson Law, as far as a the stuff we will talk about, there is no relationship, yet we know there is
The strength is important too • The more the points cluster around a line, the stronger the relationship is • Height and weight vs height in cm vs height in inches • We need something that ignores the units though, so if I did IQ and your income in real money or IQ and your income in that worthless stuff they use across the river, the numbers would be the same
Properties of r • -1.00 <= r <= +1.00 • The sign indicates ONLY the direction (think of it as going uphill or downhill) • |r| indicates the strength • So, r = -.77 is a stronger correlation than r = .40
Check these out.. • All of these have have the same correlation • R = .7 in each case • Note the problem of outliers • Note the problem of two subpopulations
Remember this • Correlation is not causation • I said, correlation is not causation • Let me say it again, correlation is not causation • Birth control and the toaster method
Wouldn’t it be nice • If we could predict y from x • You know, like an equation • Remember that in school, you would get an equation, plug in the x and get the y • Well surprise surprise, there is a method like this in statistics
If we are going to predict with a line • Well, we will make mistakes • We will want to minimize those mistakes
There is a problem, a common problem • Those prediction errors or residuals (e) sum to 0 • Damn • Though guess what we could do… • Why square them of course • So we get a line that minimizes squared residuals
In general the equation of the line is….. Y intercept slope Y hat (predicted y)
So…. • With a regression line you can predict y from x • Just because it says that some value = a linear combination of numbers it does not mean that there is necessarily a causal link • Don’t go outside the range • Linear only