440 likes | 465 Views
This course provides an overview of regression and correlation techniques in biostatistics, covering linear and non-linear relationships between variables. Learn to assess assumptions, estimate lines, and understand correlation coefficients. Practical examples using Python are included to enhance learning.
E N D
Introduction to Biostatistics and Bioinformatics Regression and Correlation
Learning Objectives • Regression – estimation of the relationship between variables • Linear regression • Assessing the assumptions • Non-linear regression
Learning Objectives • Regression – estimation of the relationship between variables • Linear regression • Assessing the assumptions • Non-linear regression • Correlation • Correlation coefficient quantifies the association strength • Sensitivity to the distribution
Relationships • Relationship • No Relationship
Relationships • Linear Relationships • Non-Linear Relationship
Relationships • Linear, Strong • Linear, Weak
Linear Regression • Linear, Strong • Linear, Weak • Non-Linear
Linear Regression - Residuals • Linear, Strong • Linear, Weak • Non-Linear Residuals Residuals Residuals
Linear Regression Model Slope Independent Variable Intercept Dependent Variable Random Error Linear component Random Error component
Linear Regression Assumptions The relationship between the variables is linear.
Linear Regression Assumptions The relationship between the variables is linear. Errors are independent, normally distributed with mean zero and constant variance.
Linear Regression Assumptions • Linear • Non-Linear Residuals Residuals
Linear Regression Assumptions • Constant Variance • Variable Variance Residuals Residuals
Linear Regression Model Slope Independent Variable Intercept Dependent Variable Random Error Linear component Random Error component
Linear Regression – Estimating the Line Estimated Intercept Estimated Slope Estimated Value Independent Variable
Least Squares Method Find slope and intercept given measurements Xi,Yi, i=1..N that minimizes the sum of the squares of the residuals.
Least Squares Method Find slope and intercept given measurements Xi,Yi, i=1..N that minimizes the sum of the squares of the residuals.
Least Squares Method Find slope and intercept given measurements Xi,Yi, i=1..N that minimizes the sum of the squares of the residuals.
Least Squares Method Find slope and intercept given measurements Xi,Yi, i=1..N that minimizes the sum of the squares of the residuals.
Linear Regression in Python import scipy.stats as stats slope,intercept,r_value,p_value,std_err= stats.linregress(x,y)
Linear Regression Example • Linear, Strong x=np.linspace(-1,1,points) y=x+0.1*np.random.normal(size=points) slope,intercept,r_value,p_value,std_err= stats.linregress(x,y) y_line=slope*x+intercept fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y,color='#4D0132',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) ax1.plot(x,y_line,color='red',lw=2) fig.savefig('linear.png') fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) fig.savefig('linear-residuals.png') Residuals
Linear Regression Example • Linear, Weak x=np.linspace(-1,1,points) y=x+0.4*np.random.normal(size=points) slope,intercept,r_value,p_value,std_err= stats.linregress(x,y) y_line=slope*x+intercept fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y,color='#4D0132',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) ax1.plot(x,y_line,color='red',lw=2) fig.savefig('linear-weak.png') fig, (ax1) = plt.subplots(1,figsize=(4,4)) ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60) ax1.set_xlim([-1.5,1.5]) ax1.set_ylim([-1.5,1.5]) fig.savefig('linear-weak-residuals.png') Residuals
Linear Regression Example • Outlier
Regression – Non-linear data Solution 1: Transformation Solution 2: Non-linear Regression
Correlation Coefficient • A measure of the correlation between the two variables • Quantifies the association strength • Pearson correlation coefficient:
Correlation Coefficient Source: Wikipedia
Coefficient of Variation Sample Mean Variance Coefficient of Variation (CV)
Correlation Coefficient and CV Uniform distribution
Correlation Coefficient and CV Uniform distribution Normal distribution Lognormal distribution
Correlation Coefficient - Outliers • Outlier
Correlation Coefficient – Non-linear • Solutions: • Transformation • Rank correlation (Spearman, r=0.93)
Correlation Coefficient and p-value Hypothesis:Is there a correlation? p p p r r r
Application: Analytical Measurements Measured Concentration Theoretical Concentration
A Few Characteristics of Analytical Measurements Accuracy: Closeness of agreement between a test result and an accepted reference value. Precision: Closeness of agreement between independent test results. Robustness:Test precision given small, deliberate changes in test conditions (preanalytic delays, variations in storage temperature). Lower limit of detection: The lowest amount of analyte that is statistically distinguishable from background or a negative control. Limit of quantification: Lowest and highest concentrations of analyte that can be quantitatively determined with suitable precision and accuracy. Linearity: The ability of the test to return values that are directly proportional to the concentration of the analyte in the sample.
Limit of Detection and Linearity Measured Concentration Theoretical Concentration
Precision and Accuracy Measured Concentration Measured Concentration Theoretical Concentration Theoretical Concentration
Summary - Regression Source: http://xkcdsw.com/content/img/2274.png
Next Lecture: Experimental Design & Analysis Experimental Design by Christine Ambrosino www.hawaii.edu/fishlab/Nearside.htm