100 likes | 117 Views
The relationship between two continuous variables is explored through correlations and linear regression. Correlation, indicated by the symbol "r," measures the strength of the relationship, ranging from -1 to +1. The significance of the correlation is denoted by the "p" value. While parametric correlation assumes a normal distribution of both variables, non-parametric correlation relies on ranks. The regression analysis aims to approximate the relationship between variables using the least-squares method, determining the intercept and slope of the fitted line. Regression assumptions include normally distributed residuals and independent variance. The presentation of results includes the equation, standard error of slope, and interpretation of the relationship's strength. Transformations and considerations for outliers are paramount in regression analysis.
E N D
Relationship between two continuous variables: correlations and linear regression both continuous. Correlation – larger values of one variable correspond to larger/smaller values of the other variable. r measures the strenght. From plus one to minus one, zero – no relationship; one – on a straight line. p measures stat significance, significance of r differing from zero. Parameetric or Pearson correlationassumes normal distribution of both variables.
We start calculating Pearson’s r from calculating covariation: ... which is not convenient, so let’s scale it: .... and the result is between –1 ja +1
r = -0.96 r = 0.96 r = -0.83 r = 0.53 r = 0.43 r = -1
Non-parametric correlation relies on ranks. Single observations far away do not disturb. UsuallySpearman’s (rank) correlation. Power is lower, but also real differences – what to think about non-linear relationship? Ordinal variables. Philosophical aspect – we can describe the same thing differently in mathematical terms!
We report the result: “between …. there was a correlation (r= , N=, p= )”orif non-parametric then “..... (rs= ; N=, p= )” Symmetrical and dimensionless. To appoximate the relationship by a function - regression. Least-squares method – residuals predicting: predicted value. The fitted line has two parameters: intercept and slope (b). Slope has a unit, value depends on the units of the axes.
eggs laid weight, kg y = 2,04x – 1,2
wool production, kg hours basked y = -0,195x + 7,1
Test following the path of ANOVA F=MSmodel/MSerror SStotal=SSmodel+SSerror, R2= SSmodel/SStotal model acconts ... % of variance. Two ways to express strength – slope andR2, p does not measure the strength of the relationship.
Presenting results “weight depended on length (b=..., R2= ....., df=....., F= ..., p<0.001)” equation: length = 3.78*temperature + 47.6 Standard error of slope Intercept zero – proportional, if x changes k times, then also y changes k. Regression is not symmetrical!
Assumptions of regression analysis are as follows: - residuals should be normally distributed; - variance of residuals must be independent on the values of x – otherwise heteroscedastic. - no other dependence on x; Distribution of x variable not important. Transformations – but do not forget when writing the equation. Regression through the origin.