1 / 38

Correlation

Correlation. (Lírios-Vicent Van Gogh,1889). setosa. virginica. versicolor. Iris data. Fisher’s iris data. S.Length S.Width P.Length P.Width Species 1 5.1 3.5 1.4 0.2 setosa

reed-zamora
Download Presentation

Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation

  2. (Lírios-Vicent Van Gogh,1889)

  3. setosa virginica versicolor Iris data

  4. Fisher’s iris data S.Length S.Width P.Length P.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa ………………. 49 5.3 3.7 1.5 0.2 setosa 50 5.0 3.3 1.4 0.2 setosa 51 7.0 3.2 4.7 1.4 versicolor 52 6.4 3.2 4.5 1.5 versicolor …………………. 99 6.2 2.9 4.3 1.3 versicolor 100 5.7 2.8 4.1 1.3 versicolor 101 6.3 3.3 6.0 2.5 virginica ………………… 150 5.9 3.0 5.1 1.8 virginica

  5. 2.0 3.0 4.0 0.5 1.5 2.5 7.5 6.5 Sepal.Length 5.5 4.5 4.0 Sepal.Width 3.0 2.0 7 6 5 Petal.Length 4 3 2 1 2.5 1.5 Petal.Width 0.5 3.0 Species 2.0 1.0 4.5 5.5 6.5 7.5 1 2 3 4 5 6 7 1.0 2.0 3.0 Scatter-plot matrix

  6. setosa 4.0 3.5 Sepal.Width 3.0 2.5 virginica versicolor 2.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Sepal.Length Scatter plot (by group) and Trendlines

  7. 4.0 3.5 Sepal.Width 3.0 2.5 4.5 5.0 5.5 Sepal.Length Scatterplot for setosa of iris data

  8. 90 90 90 70 70 70 50 50 50 30 30 30 30 50 70 90 30 50 70 90 30 50 70 90 no apparent relationship negative relationship positive relationship How to quantify the relationship ?

  9. 90 90 90 70 70 70 50 50 50 30 30 30 30 50 70 90 30 50 70 90 30 50 70 90 count pairs

  10. count positive pairs

  11. count negative pairs

  12. 90 90 70 70 50 50 30 30 300 500 700 900 30 50 70 90 Need to consider scale matters

  13. -10, -2, 3, 5, 7, 9 5, -7, 10, -3, 8, 5 Maximize the sum of products of each pair.

  14. -10, -2, 3, 5, 7, 9 -7, -3, 5, 5, 8, 10 -10, -2, 3, 5, 7, 9 10, 8, 5, 5, -3, -7 positively matched, negatively matched

  15. 90 90 90 70 70 70 50 50 50 30 30 30 30 50 70 90 30 50 70 90 30 50 70 90 - 0 +

  16. Need to consider scale

  17. Cauchy-Schwartz inequality -1 +1 90 90 70 70 50 50 30 30 30 50 70 90 30 50 70 90 (very strong) negative linear relationship (very strong) positive linear relationship

  18. Exercise

  19. Sample version Population version

  20. Population covariance

  21. Population covariance Exercise

  22. 150 100 50 y 0 -50 Covariance is a measure of linear association between two variables. Covariance is not a measure curved association. -100 20 40 60 80 100 x

  23. Covariance may be any real value, but correlation is a value only in [-1,1]. Covariance is affected by scales of variables, but correlation is not, except of sign of scale.

  24. Correlation is the covariance for standardized variables

  25. 90 90 70 70 50 50 30 30 300 500 700 900 30 50 70 90 Covariance = 189 Covariance = ? Correlation = 0.78 Correlation = ?

  26. 90 70 50 30 30 50 70 90 Covariance = 189 Covariance = ? Correlation = 0.78 Correlation = ? 120 100 80 60 -90 -80 -70 -60

  27. Grouped by Zip Code Gathering groups with + corr's does not give + corr.

  28. Correlation is a measure of linear association, but not a causation. High correlation does not mean that one variable is the cause of the other.

  29. Correlation and causality The more Starbucks, the higher APT price ! APT prices in Seoul The more STBK stores, the higher will APT price increase ?

  30. STBK: number of Starbucks stores APT price: Average APT price by a 1 m2 The more Starbucks, the deeper financial crisis are !

  31. Thank you !!

More Related