1 / 94

Chapter 9

Chapter 9. Linear Regression and Correlation. Bivariate quantitative data:. Population: finite and infinite paired variable values Sample: n observations on the explanatory and the response variable y randomly sampled from population. ( X1,Y1 ) , ( X2,Y2 ) , … , ( Xn,Yn ).

baird
Download Presentation

Chapter 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 9 Linear Regression and Correlation 102

  2. Bivariate quantitative data: Population: finite and infinite paired variable values Sample: n observations on the explanatory and the response variable y randomly sampled from population (X1,Y1), (X2,Y2), …, (Xn,Yn) Objective: study the quantitative relationship between X and Y Method: regression and correlation Simple ,basic —— linear regression ,linear correlation 102

  3. Content 1. Linear regression 2. Linear correlation 3. Rank correlation 4. Curve fitting 102

  4. Historic background: 19th century British anthropologist F.Galton correlation and coefficient of correlation statistician Karl Pearson found: There is the linear relationship betweenthe height of the sons (X,inch) and height of fathers (Y,inch). 102

  5. That is to say, the height of the sons of the tall fathers do not sure to be taller, while their height maybe shorter than their fathers. However, the height of short father’s son do not sure to be shorter, while they maybe taller than their fathers’ level . Galton call this phenomenon of race steady tendency as regression. 102

  6. Now, “regression ” has became the statistic term which show the quantitative dependency between the variables, and formed some new statistic concepts such as the “regression equation” and “regression coefficient”. For example : study the relationship between the blood sugar and insulin level . study the relationship between the the age and the weight of children. 102

  7. 9.1 linear regression 102

  8. 9.1.1 concepts Objective: study the dependency between the dependent variable Y and the independent variable X. Feature: statistic relationship. relationship between the means of the X and Y Differ from the functional relationship between X and Y in general mathmatics 102

  9. Example 9-1 : A endemic disease institute has investigated urine creatinine concentrations(mmol/24h)of eight health children , in table 9-1. Please estimate the regression equation of the urine creatinine concentrations( Y ) to the age (X). 102

  10. Table 9-1 age X (years old ) and urine creatinine contents Y (mmol/24h)in eight health children 102

  11. Urine creatinine contents (mmol/24h) age (years old) X Figure 9-1 the scatter-plot of the age versus the urine creatinine contents in eight children 102

  12. When we describe the quantitative dependency of the urine creatinine contents and ages, we selected the age as independent variable, expressed by X, urine creatinine concentration as dependent variable , expressed by Y. 102

  13. Figure 9-1 displays that the urine creatinine contents Y lineally increase with the increase of the ages X, while it differ from the strict linear functional relationship of the two variables ,compared that the eight dots do not all on the line exactly. So we call this phenomenon as linear regression , the equation as linear regression model distinguish with the strict linear equation. Bivariate linear regression is the most basic and simplest regression , so this regression also called simple regression. 102

  14. Linear regression model is is the estimate of the means of Y corresponding to X 102

  15. 1.a is the intercept of the regression line on the axis Y • a > 0,show the point of intersection of line and the axis y is over the origin • a < 0,show the piont is below the origin • a = 0,show the line get through the origin Y a < 0 a = 0 a > 0 X 102

  16. 2. b is the regression coefficient and the slope of the line 。 b>0,y increase with the increase of X b<0,y decrease with the increase of X b=0,no linear correlation between two variables. Y b>0 b=0 b<0 X statistical significance of b:when X changed a unit , the Y changed b units on average. 102

  17. Formula (9-1) is the sample regression model. It is the estimate of the linear relationship of the two population variables. We can assume that the mean of the response Y corresponding to X will be on the line (figure 9-2) according to the scatter-plot. 102

  18. Figure 9-2 linear regression concepts 102

  19. 9.1.2 the calculation method of the linear regression equation • Residual: • Calculating a、b is also to find a best line to represent the distribution tendency of the data. (X,Y) principles: least sum of squares 102

  20. 102

  21. Besides the linear relationship of the two variables in the figure, we assume the population Y corresponding to X to be normal distribution and the population total variances of the normal distribution to be equal and independent. The is the sample estimate of the population means of y corresponding to x in the formula (9-1) and the predicted value of the regression equation, while a and b are the estimations of α,β respectively. 102

  22. Example 9-1: A endemic disease institute has investigated urine creatinine contents (mmol/24h)of eight healthy children , in table (9-1). Please estimate the regression equation of the urine creatinine contents( Y ) corresponding to the age (X). 102

  23. Table 9-1 age X (years old ) and urine creatinine contents Y (mmol/24h)in healthy children 102

  24. Steps of solution • There is the linear tendency between the two variables by observing the original data and the scatter-plot (figure 9-1).We can do the following calculation. 、 、 2 Calculate 102

  25. 102

  26. 4. Calculate the regression coefficient b and the intercept a 102

  27. The line certainly go through the dot ( , ) and intersect the Y axis on the intercept a. If the scatter-plot does not began with the origin, we will remain get the regression line by linking the dot( , ) and the faraway spot easy to read in the range of the independent variable X. 102

  28. Urine creatinine contents Age (years old) Figure9-1 scatter-plot of the urine creatinine contents versus the age of the children 102

  29. 9.1.3 statistical inference in linear regression 102

  30. 1. Hypothesis test in regression equation Building the sample regression equation not only to describe the relationship of the two variables but also to explain the fact of the existence of the linear regression relationship from the population, that is to say ? 102

  31. 102

  32. If β=0,there is no linear relationship between x and y. If b≠0,how much the difference between the b and 0? We will answer this question by ANOVA and t-test 102

  33. 1. ANOVA To understand the basic idea of the ANOVA, we will decompose the (sum of squares of deviations from mean): 102

  34. (X,Y) 102

  35. In figure 9-4: It can be proved by: 102

  36. It can be explained by : In the formula: = = 102

  37. There is the relationship of the three degree of freedom : If the contribution of regression if much more than random error, we will calculate the F value to sure the statistic significance. 102

  38. In the formula: 102

  39. 2. t-test Whether the β=0? 102

  40. Example 9-2,please test the linear regression equation from the data 9-1. 102

  41. (1)ANOVA 102

  42. Display the table of ANOVA Table 9-2 table of ANOVA V1=1, V2=6, check the F distribution , get P<0.05,according to the α=0.05,reject H0,accept H1, we can believe that there is the linear relationship between the urine creatinine contents and the ages. 102

  43. (2)t test V=6, check the t distribution , get0.002 <P<0.005,according to the α=0.05,reject H0,accept H1, we can believe that there is the linear relationship between the urine creatinine contents and the ages. 102

  44. Confidence interval of population regression coefficient β the 1-α CI of the β 102

  45. Example 9-3, please estimate the two sides 95%CI of the population regression coefficient according to the b=0.1392 of sample 9-1. 102

  46. (0.1392-2.447×0.0304,0.1392+2.447×0.0304) =(0.0648,0.2136) 102

  47. (3) Estimation and prediction 1.Confidence interval of population means Standard error of the sampling error (9-14) When X=X0, The 1-αCI of the (9-15) 102

  48. 2. the interval of estimate of Y (9-16) (9-17) 102

  49. 102

  50. Example 9-4, when X0=12,calculate the 95%CI of and the 95% prediction CI of Y by the linear regression equation of the example 9-1. 102

More Related