1 / 125

Chapter 10

Chapter 10. Linear regression and correlation. Relationship between variables. Relationship between variables. Age and blood pressure Nutrient level and growth of cells Height and weight.

austin
Download Presentation

Chapter 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 10 Linear regression and correlation Relationship between variables

  2. Relationship between variables • Age and blood pressure • Nutrient level and growth of cells • Height and weight To determine the strength of relationship between two variables and to test if it is statistically significant

  3. Two samples t test vs regression

  4. Difference, variation and association analysis Group(0,1) (category) Group(ABC) (category) (quantity)

  5. Sir Francis Galton (16 February 1822 – 17 January 1911) Polymath: Meteorology (the anti-cyclone and the first popular weather maps); Psychology (synaesthesia); Biology (the nature and mechanism of heredity); Eugenicist; Criminology (fingerprints); Statistics (regression and correlation).

  6. Related but Different Regression analysis: one of the variables (e.g. blood pressure) is dependent on (caused by) the other which are fixed and measured without error (e.g. age). Correlation analysis: both variables are experimental and measured with error (e.g height and weight).

  7. Regression analysis The experimental data Repeated experiments

  8. Correlation analysis The experimental data More individuals measured

  9. Regression analysis Equation for a straight line If you know a and b, can predict Y from X ----the goal of regression

  10. Regression vs correlation

  11. Concepts • Simple linear regression • Simple linear correlation • Correlation analysis based on ranks

  12. Example • Consider growth rate of a yeast colony and nutrient level . • If you increase nutrient level, the growth rate would increase. • Growth rate is dependent on nutrient level but nutrient level is NOT dependent on growth rate.

  13. Growth rate Y Variables in Regression • Growth rate is called the Dependent Variable and is given the symbol Y. • Nutrient level (the causal factor) is called the Independent Variable and is given the symbol X. Nutrient level X

  14. Single linear model assumption • X’s are fixed and measured without error • Homoscedastic α,β:constant real numbers, β≠0 • independent identically distributed

  15. General steps for simple linear regression analysis • Graphing the data • Fitting the best straight line • Testing whether the linear relationship is statistically significant or not

  16. Graphing the data • Fitting the best straight line No relationship Relationship but not straight-lined Which one? Negative linear relationship Positive linear relationship Need criterion

  17. Area (y) H Intercept (at x=0) L a Time days (x) Example: Area of a yeast colony on successive days. The best fit Slope (b) = H/L 0 0

  18. Area (y) Time days (x) Problem How to estimate a and b?

  19. Method Fitting to the data y 0 x 0 Total sum of squares for Y:

  20. Method Fitting to the data Area (y) a and b should minimize the residual error Time days (x) Residual error sum of squares:

  21. Method

  22. =0 Sum of Squares Total SSTotal Sum of Squares due to regression SSR Sum of Squares Residual or error SSE maximize minimize

  23. Least Square Regression Equation • Minimize SSError by partial derivatives =0

  24. Least Square Regression Equation =0

  25. Result • Least squares regression line

  26. Simple Linear Regression Analysis • A global test for regression (ANOVA) • A test for regression coefficient (Student t test)

  27. Hypothesis • H0: The variation in Y is not explained by a linear model, i.e., β=0 • Ha: A significant portion of the variation in Y is explained by a linear model i.e., β≠0

  28. Partitioning the Sum of Squares

  29. Source of variation SS DF MS E(MS) F c.v. Regression SSR 1 MSR See Table C.7 Error SSE n-2 MSE Total SSTotal n-1 The ANOVA table for a regression analysis If H0 is true Test statistic: If Ha is true, β=0 =1

  30. Coefficient of determination a measure of the amount of the variability in Y that is explained by its dependence on X.

  31. Simple Linear Regression Analysis • A global test for regression (ANOVA) • A test for regression coefficient (Student t test)

  32. Hypothesis • H0: The variation in Y is not explained by a linear model, i.e., β=0 • Ha: A significant portion of the variation in Y is explained by a linear model i.e., β≠0

  33. t test statistic • Variance of b: • It’s estimate: • Standard error of b:

  34. F(1,n-2) = t2(n-2)

  35. Confidence interval for β • follow student’s t distribution • Confidence interval

  36. Sampling error Confidence Interval for • Since • And • Standard error of is:

  37. Confidence Interval for follow student’s t distribution  Confidence interval L1 L2

  38. Sampling error Confidence Interval for • Since • And • Standard error of is:

  39. Understand the regression analysis via example

  40. Example1:Yield of tomato varieties Summarized data: Totals:

  41. A. Student’s t test • There is no difference between the two variances • There is difference between the two mean Accept H0 Reject H0

  42. B. ANOVA Conclusion: Reject H0

  43. Compare ANOVA with t test • t was 2.16 for 18df, 0.05 P 0.01 F was 4.67 for 1 and 18 df, 0.05 P 0.01 • In fact, F= t2 (i.e. 4.67=2.162) Why? Because with t we are dealing with differences while with F we are dealing with variances (differences squared)

  44. C. Regression

  45. Calculations

  46. Estimation • Regression coefficient • Intercept atx • Intercept • regression equation

  47. Testing the significance ANOVA • H0: no linear relation between y and x. β=0 • Ha: the variation in y is linearly explained by the variation in x. i.e., β≠0

More Related