1 / 44

Introduction to Biostatistics: Regression and Correlation Fundamentals

This course provides an overview of regression and correlation techniques in biostatistics, covering linear and non-linear relationships between variables. Learn to assess assumptions, estimate lines, and understand correlation coefficients. Practical examples using Python are included to enhance learning.

alamar
Download Presentation

Introduction to Biostatistics: Regression and Correlation Fundamentals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Biostatistics and Bioinformatics Regression and Correlation

  2. Learning Objectives • Regression – estimation of the relationship between variables • Linear regression • Assessing the assumptions • Non-linear regression

  3. Learning Objectives • Regression – estimation of the relationship between variables • Linear regression • Assessing the assumptions • Non-linear regression • Correlation • Correlation coefficient quantifies the association strength • Sensitivity to the distribution

  4. Relationships • Relationship • No Relationship

  5. Relationships • Linear Relationships • Non-Linear Relationship

  6. Relationships • Linear, Strong • Linear, Weak

  7. Linear Regression • Linear, Strong • Linear, Weak • Non-Linear

  8. Linear Regression - Residuals • Linear, Strong • Linear, Weak • Non-Linear Residuals Residuals Residuals

  9. Linear Regression Model Slope Independent Variable Intercept Dependent Variable Random Error Linear component Random Error component

  10. Linear Regression Assumptions The relationship between the variables is linear.

  11. Linear Regression Assumptions The relationship between the variables is linear. Errors are independent, normally distributed with mean zero and constant variance.

  12. Linear Regression Assumptions • Linear • Non-Linear Residuals Residuals

  13. Linear Regression Assumptions • Constant Variance • Variable Variance Residuals Residuals

  14. Linear Regression Model Slope Independent Variable Intercept Dependent Variable Random Error Linear component Random Error component

  15. Linear Regression – Estimating the Line Estimated Intercept Estimated Slope Estimated Value Independent Variable

  16. Least Squares Method Find slope and intercept given measurements Xi,Yi, i=1..N that minimizes the sum of the squares of the residuals.

  17. Least Squares Method Find slope and intercept given measurements Xi,Yi, i=1..N that minimizes the sum of the squares of the residuals.

  18. Least Squares Method Find slope and intercept given measurements Xi,Yi, i=1..N that minimizes the sum of the squares of the residuals.

  19. Least Squares Method Find slope and intercept given measurements Xi,Yi, i=1..N that minimizes the sum of the squares of the residuals.

  20. Linear Regression in Python import scipy.stats as stats slope,intercept,r_value,p_value,std_err= stats.linregress(x,y)

  21. Linear Regression Example • Linear, Strong x=np.linspace(-1,1,points) y=x+0.1*np.random.normal(size=points) slope,intercept,r_value,p_value,std_err= stats.linregress(x,y) y_line=slope*x+intercept fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y,color='#4D0132',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) ax1.plot(x,y_line,color='red',lw=2) fig.savefig('linear.png') fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) fig.savefig('linear-residuals.png') Residuals

  22. Linear Regression Example • Linear, Weak x=np.linspace(-1,1,points) y=x+0.4*np.random.normal(size=points) slope,intercept,r_value,p_value,std_err= stats.linregress(x,y) y_line=slope*x+intercept fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y,color='#4D0132',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) ax1.plot(x,y_line,color='red',lw=2) fig.savefig('linear-weak.png') fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) fig.savefig('linear-weak-residuals.png') Residuals

  23. Linear Regression Example • Outlier

  24. Regression – Non-linear data Solution 1: Transformation Solution 2: Non-linear Regression

  25. Correlation Coefficient • A measure of the correlation between the two variables • Quantifies the association strength • Pearson correlation coefficient:

  26. Correlation Coefficient

  27. Correlation Coefficient

  28. Correlation Coefficient

  29. Correlation Coefficient

  30. Correlation Coefficient

  31. Correlation Coefficient Source: Wikipedia

  32. Coefficient of Variation Sample Mean Variance Coefficient of Variation (CV)

  33. Correlation Coefficient and CV Uniform distribution

  34. Correlation Coefficient and CV Uniform distribution Normal distribution Lognormal distribution

  35. Correlation Coefficient - Outliers • Outlier

  36. Correlation Coefficient – Non-linear • Solutions: • Transformation • Rank correlation (Spearman, r=0.93)

  37. Correlation Coefficient and p-value Hypothesis:Is there a correlation? p p p r r r

  38. Application: Analytical Measurements Measured Concentration Theoretical Concentration

  39. A Few Characteristics of Analytical Measurements Accuracy: Closeness of agreement between a test result and an accepted reference value. Precision: Closeness of agreement between independent test results. Robustness:Test precision given small, deliberate changes in test conditions (preanalytic delays, variations in storage temperature). Lower limit of detection: The lowest amount of analyte that is statistically distinguishable from background or a negative control. Limit of quantification: Lowest and highest concentrations of analyte that can be quantitatively determined with suitable precision and accuracy. Linearity: The ability of the test to return values that are directly proportional to the concentration of the analyte in the sample.

  40. Limit of Detection and Linearity Measured Concentration Theoretical Concentration

  41. Precision and Accuracy Measured Concentration Measured Concentration Theoretical Concentration Theoretical Concentration

  42. Summary - Regression Source: http://xkcdsw.com/content/img/2274.png

  43. Summary - Correlation

  44. Next Lecture: Experimental Design & Analysis Experimental Design by Christine Ambrosino www.hawaii.edu/fishlab/Nearside.htm

More Related