1 / 43

Lecture10 CORRELATION & REGRESSION

Lecture10 CORRELATION & REGRESSION. Xiaojin Yu Department of Epidemiology and Biostatistics, public Health school, Southeast University. review. Comparison of means :t –test Comparison of proportions: Chi-square test Comparison of Median: Rank sum test. Review on rank sum test.

charliew
Download Presentation

Lecture10 CORRELATION & REGRESSION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture10 CORRELATION & REGRESSION Xiaojin Yu Department of Epidemiology and Biostatistics, public Health school, Southeast University

  2. review • Comparison of means :t –test • Comparison of proportions: Chi-square test • Comparison of Median: Rank sum test

  3. Review on rank sum test • raw data and Rank ( cardinal and ordinal number) • Rank sum test_ methods based on rank • 2 independent groups_ willcoxon rank sum test • 2 paired groups_ sign rank sum test

  4. Solution to height comparison height between F and M Blue-male Red- female 4

  5. Cats rabbits minutes rank minutes rank 25 9.5 14 1 34 13 15 2 44 15 16 3 46 16 17 4 46 17 19 5 48 18 21 6.5 49 19 21 6.5 50 20 23 8 25 9.5 28 11 30 12 35 14 n1=8 T1=127.5 n2=12 T2=82.5 EXAMPLE 9.1: Table 9.1 Survival Times of Cats & Rabbits without oxygen

  6. Solution to Example9.1 • H0:M1=M2 population locations of survival time of both cat and rabbit are equal H1: M1 ≠ M2population locations of survival time of both cat and rabbit are not equal ; a = 0.05 • Sorting and ranking, calculate 2 Rank sums of 2 groups. Take the Ti with small n as T. n1=8<n2=12, so T= T1 =127.5. • critical interval of T0.05 (58-110),T=127.5, is beyond of Tα, so, P≤α, Given α=0.05, P<0.05; H0 is rejected, it concludes that the survival times of cats and rabbits in the environment without oxygen might be different.

  7. Basic logics of scientific research • To find the difference • To find the correlation

  8. Contents • linear Correlation • Rank correlation • simple linear regression

  9. Correlations in medicine • Drinking a glass of red wine per day may decrease your chances of a heart attack. • Taking one aspirin per day may decrease your chances of stroke or of a heart attack. • Eating lots of certain kinds of fish may improve your health and make you smarter. • Pregnant women that smoke tend to have low birthweight babies. • Taller people tend to weigh more • Animals with large brains tend to be more intelligent. • The more you study for an exam, the higher the score you are likely to receive.

  10. Model Types…Relationship between variables • Deterministic Model: an equation that allow us to fully determine the value of the dependent variable from the values of the independent variables. • S =R*R • Probabilistic Model: a method used to capture the randomness that is part of a real-life process. • Weight(Y,kg) vs. Height (X,cm)/ • For example: 18-years-old y=0.8X-69

  11. Correlation & regression • If we are interested only in determining whether a relationship exists, we use correlation analysis. • If we are interested in predicting the value of one variable (the dependent variable) on the basis of other variables (the independent variables),we use Regression analysis . • Dependent variable: denoted Y • Independent variables: denoted X1, X2, …, Xk

  12. Ex.1 Height of 2 years old and 20 years old

  13. Scatter plot 71 69 Yheight of 20 years old(inch) 67 65 63 30 32 34 36 38 40 X height of 2 years old(inch)

  14. Correlation • Concept • Calculate r • Statistical inference for population correlation coefficient Hypothesis test

  15. Measure of correlation • Pearson’s Linear Correlation Coefficient, The correlation, denoted by r, measures the amount of linear association between two variables, strength and direction. • r is always between -1 and 1 inclusive. • [-1, 1] • Population’s Correlation Coefficient: ρ • Sample’s Correlation Coefficient: r

  16. Null Positive Negative Null 0<r<1 -1<r<0 r=0 r=0 Completely Positive Completely Negative Null Null r=1 r=-1 r=0 r=0 Different Patterns of Correlation

  17. How high must a correlation be to be considered meaningful?

  18. Magnitude & direction • The larger the absolute value of correlation coefficient , the stronger the correlation. • If the sign is positive(+), the two variables varies at the same direction; If the sign is positive(-), the two variables varies at the opposite direction.

  19. Calculation of correlation coefficient

  20. Calculation of correlation coefficient

  21. Hypothesis test for ρ • H0: ρ=0, there is no linear relationship between x and y ; • H1: ρ=0, there is linear relationship between x and y • Test methods ① t-test ② look up table

  22. t test for pho H0:ρ=0 ,there is no linear relationship between 2 variables H1:ρ≠ 0, there is linear relationship between 2 variables ,α=0.05 • According t critical value,P<0.05,reject H0,accept H1,conclude that there islinear relationship between height of 2 years old and adult height. 。 ν=8-2=6

  23. Caution:   Correlation does not necessarily imply causation. • If X is correlated with Y, there could be five explanations:  • X causes Y • Y causes X • X causes Y and Y causes X •  Some third variable Z causes X and Y • The correlation is a coincidence; there is no causal relationship between X and Y.

  24. some examples of correlations with implied causations • The more firemen that are fighting a fire, the bigger the fire is going to be. • Children that sleep with the light on are likely to develop nearsightedness later in life. • Women that take hormone replacement therapy (HRT) are less likely to have coronary heart disease. • As ice cream sales increase, the rate of drowning deaths increase.

  25. Regression analysis • Correlation analysis tells us how close that relationship between 2 variables is • Regression analysis tells us something about relationship between 2 variables, how one changes with the other, can be used to predict another. • How to predict adult height based on height of 2 years old?

  26. Why ‘regression? Francis Galton(1822-1911) • “regression” -British biologist F Galton • Like father, like son (Chinese proverb)

  27. Regression towards Mediocrity.

  28. 71 69 Y adult height(inch) 67 65 63 30 32 34 36 38 40 X height of 2 years old(inch) Scatter plot

  29. Definition of variables • Y (dependent variable, response variable, outcome variable) • X (independent variable, explanatory variable, predictor variable) • Yhat is the average y when x is given.

  30. Regression Equation • a is intercept,the value of y when X=0; • b, slope, 。 • b, regression coefficient, the average units that y change when x change by 1 units.

  31. Steps of regression analysis • 1.Scatter plot for linear trend • 2.Estimate slope and intercept。 • 3.Draw regression line • 4.Significant test for b

  32. Least Square Estimation (LSE) • principle:assure the sum ofsquared differences between Y’s and their estimates( residuals) based on regression line , that is least square estimation of regression parameters.

  33. Least Square Estimation 71 69 Y adult height(inch) 67 65 63 30 32 34 36 38 40 X height of 2 years old

  34. Data

  35. Relationship between 2 heights variables

  36. Draw regression line • Choose 2 x points that are not too close and easy to measure, calculate the estimates based on the equation. • For example, • X=30 Yhat=35.1776+0.9286×30= • X=36 Yhat=35.1776+0.9286×36=

  37. Significance test for β • Population regression coefficient  =0,there is no regression relationship. • H0: =0, there is no regression relationship between x and y ; • H1:0 , there is regression relationship between x and y ; • =0.05。

  38. t-test for Regression coefficient H0: =0, there is no regression relationship between x and y ; H1:0 , there is regression relationship between x and y ; =0.05。 residual standard deviation of y-variation of y after removing the part that can be explained by x.

  39. t-test for Regression coefficient H0: =0, there is no regression relationship between x and y ; H1:0 , there is regression relationship between x and y ; =0.05。 =8-2=6 According to t critical value ,P<0.01, reject H0 ,accept H1 at  =0.05 significance level. there is regression relationship between 2 height variable.

  40. t-test for regression and correlation • the results of t test for b and for r are equal for same dataset. So we can use the result of correlation coefficient in the test of regression coefficient.

  41. Linear correlation & linear regression • Concept: • the show the degree of covariation between 2 variables • functional relationship between a dependent variable y and one or more independent variables • Range: • Unit b has unit, but r no unit • b and r have same direction

  42. Thank you for your attention!

More Related