160 likes | 165 Views
Correlation. I have two variables , pra c tic ally „ equal “ (traditi onally marked as X a nd Y ) – I ask , if they are independent and if they are „ c or r el ated “, how much then. (Pearson) Correlation coefficient.
E N D
Correlation • I have two variables, practically „equal“ (traditionally marked asX andY) – I ask, if they are independent and if they are „correlated“, how much then.
(Pearson) Correlation coefficient If positive deviations from mean in X are connected with positive deviations in Y, and negative ones with negative ones, then the sum is positive Dimensionless number (covariance standardized by variances of single variables), -1 means deterministic negative dependence, +1 deterministic positive dependence.
We presume linear relation, or two-dimensional normal distribution
Even here is r~0, though values aren’t independent But mind, that Yhasn’t normal distribution for this X
r=+0.99 r=-0.99
r=-0.83 r=+0.83
r=-0.45 r=+0.45
Test of null hypothesisH0: =0 ris estimation of parameter of population - . Again translates to the t-test We can use again both, one- and two-tailed test. It is even possible to test null hypothesis, that =some non-zero value, procedure is more complicated.
There are also tabled critical values of r (for different sample sizes)
Comparison with regression • It holds, that coefficient of determination in regression (R2) is square of correlation coefficient computed from the same two variables. • Probability level of significance test about independence is exactly the same in regression and for correlation coefficient.
Power of test • Regression is significant just when correlation coefficient is significant. • Power of test increases (in both) with strength of relation and with number of observations. • When I want to estimate somehow, how much observations I need, I must have an idea, how tight the relation is (how high R2 or ρ is in population).
Power of test: critical values r – it is possible to look for how much observations I need to have ~50% chance to reject H0 on given level of significance (at known ρ) More precise calculations are possible, but in any case, I need to have an idea, what is the correlation in population.
Coefficient of rank correlation (Spearmann) [there is also Kendall] • I replace every variable with its rank and I compute its correlation coefficient from rank. For greater samples even values for normal (Pearson) correlation coefficient hold. We can use formula d is difference in rank
But also Spearmann c. will be 0 in this case We can say, that Pearson correlation coefficient is a measure of linear dependence, Spearman is a measure of monotonic dependence.
Another possibility is to use permutation test • I change values of independent variable randomly and I count, how many times the resulted dependent variable will be “so nice” as from our data.