1 / 19

Lecture 12 Correlation and linear regression

Lecture 12 Correlation and linear regression. The least squares method of Carl Friedrich Gauß . OLRy. y = ax + b. D y 2. D y. Covariance. Variance. C orrelation coefficient. Slope a and coefficient of correlation r are zero if the covariance is zero.

phyre
Download Presentation

Lecture 12 Correlation and linear regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture12 Correlation and linearregression Theleastsquaresmethod of Carl Friedrich Gauß. OLRy y = ax + b Dy2 Dy

  2. Covariance Variance Correlation coefficient Slope a and coefficient of correlationrare zero ifthecovarianceis zero. Coefficient of determination

  3. Relationshipsbetweenmacropterous, dimorphic and brachypterousgroundbeetles on 17 Mazurianlakeislands Positivecorrelation; r =r2= 0.41 Theregressionisweak. Macropterousspeciesrichnessexplainsonly 17% of thevarianceinbrachypterousspeciesrichness. We havesomeislandswithoutbrachypterousspecies. We reallydon’tknowwhatisthe independent variable. Thereis no clear cutlogicalconnection. Positivecorrelation; r =r2= 0.67 Theregressionismoderate. Macropterousspeciesrichnessexplainsonly 45% of thevarianceindimorphicspeciesrichness. Therelationshipappears to be non-linear. Log-transformationisindicated (no zero counts). We reallydon’tknowwhatisthe independent variable. Thereis no clear cutlogicalconnection.

  4. Negativecorrelation; r =r2= -0.48 The regression is weak. Island isolation explains only 23% of the variance in brachypterous species richness. We have two apparent outliers. Without them the whole relationship would vanish, it est R20. Outliers have to be eliminated fom regression analysis. We have a clear hypothesis about the logical relationships. Isolation should be the predictor of species richness. No correlation; r =r2= 0.06 The regression slope is nearly zero. Area explains less than 1% of the variance in brachypterous species richness. We have a clear hypothesis about the logical relationships. Area should be the predictor of species richness.

  5. Thematrixperspective X is not quadratic. Itdoesn’tpossess an inverse

  6. Variance Covariance

  7. Covariances Variances Thecovariancematrix issquare and symmetric

  8. Non-linearrelationships Groundbeetles on Mazurianlakeislands Linearfunction Logarithmicfunction Power function Thespecies – individualsrelationshipareobviouslynon-linear. Thepowerfunctionhasthehighest R2 and explainstherefore most of thevarianceinspeciesrichness. Thecoefficient of determinationis a measure of goodness of fit. Slope Intercept

  9. Havingmorethan one predictor Describespeciesrichnessindependence of numbers of individuals, area, and isolation of islands. We need a clear hypothesisabout dependent and independent predictors. Use a block diagram. Individuals Area Isolation Species

  10. Predictorsare not independent. Numbers of individualsdepends on area and degree of isolation. We needlinearrelationships Collinearity Individuals Area Isolation Species We uselntransformedvariables of species, area, and individuals. Check for multicollinearity using a correlationmatrix. We check for non-linearitiesusingplots. Thecorrelationbetweenarea and individualsishighlysignificant. Theprobability of H0 = 0.004. Of thepredictorsarea and individualsarehighlycorrelated. In linearregressionanalysiscorrelations of predictorsbelow 0.7 areacceptable.

  11. Thefinal data for ouranalysis Thepredictorvariableshave to containdifferentinformation. If X issingular no inverseexists ThematrixXcontainstheeffect (predictor) variables ThevectorYcontainstheresponsevariable Multiplelinearregression The model

  12. Theprobabilitythat R2is zero isonly 0.01%. With 99.9% R2 > 0 and hencestatisticallysignificant. The model explains 78.6 % of varianceinspeciesrichness. 21.4% of avrianceremainsunexplained. Theprobabilitiesthatthecoefficientsdeviatefrom zero. Isolationis not a significantpredictor.

  13. Whatdistance to minimize? OLRy Dy2 OLRx Dx2 Model I regression

  14. RMA Dx Dy Reduced major axis regressionisthegeometricaverage of aOLRy and aOLRx Model II regression

  15. Past standard output of linearregression Reduced major axis Parameters and standard errors Parametricprobability for r = 0 Permutation test for statisticalsignificance Bothtestsindicatethat Brach and Macroare not significantlycorrelated. The RMA regressionslopeisinsignificant. We don’thave a clear hypothesisaboutthecausalrelationships. In thiscase RMA isindicated.

  16. Permutation test for statisticalsignificance g > 0 Observedr S N2.5 = 25 S N2.5 = 25 m > 0 Upper CL Lower CL Calculatingconfidencelimits Rankall 1000 coefficients of correlation and takethevaluesatrankpositions 25 and 975. Randomize 1000 times x or y. Calculateeach time r. Plot thestatisticaldistribution and calculatethelower and upperconfidencelimits.

  17. The RMA regressionhas a much steeperslope. Thisslopeisoftenintuitivelybetter. Upper CL Thecoefficient of correlationis independent of theregressionmethod Lower CL In OLRyregressioninsignificance of slopemeansalsoinsignificance of r and R2. The 95% confidence limit of theregressionslope markthe 95% probabilitythattheregressionslopeiswithintheselimits. Thelower CL isnegative, hencethe zero slopeiswiththe 95% CL.

  18. Outliershave an overproportional influence on correlation and regression. OLRy Dy2 Dy Outliersshould be eliminatedfromregressionanalysis. rPearson = 0.79 Normalcorrelation on ranked data Instead of the Pearson coefficient of correlationsuseSpearman’srank order correlation. rSpearman = 0.77

  19. Home work and literature • Refresh: • Coefficient of correlation • Pearson correlation • Spearmancorrelation • Linearregression • Non-linearregression • Model I and model II regression • RMA regression • Prepare to thenextlecture: • F-test • F-distribution • Variance Literature: Łomnicki: Statystyka dla biologów http://statsoft.com/textbook/

More Related