Understanding Correlations and Linear Regression in Quantitative Analysis

Relationship between two continuous variables: correlations and linear regression both continuous. Correlation – larger values of one variable correspond to larger/smaller values of the other variable. r measures the strenght. From plus one to minus one, zero – no relationship; one – on a straight line. p measures stat significance, significance of r differing from zero. Parameetric or Pearson correlationassumes normal distribution of both variables.

We start calculating Pearson’s r from calculating covariation: ... which is not convenient, so let’s scale it: .... and the result is between –1 ja +1

r = -0.96 r = 0.96 r = -0.83 r = 0.53 r = 0.43 r = -1

Non-parametric correlation relies on ranks. Single observations far away do not disturb. UsuallySpearman’s (rank) correlation. Power is lower, but also real differences – what to think about non-linear relationship? Ordinal variables. Philosophical aspect – we can describe the same thing differently in mathematical terms!

We report the result: “between …. there was a correlation (r= , N=, p= )”orif non-parametric then “..... (rs= ; N=, p= )” Symmetrical and dimensionless. To appoximate the relationship by a function - regression. Least-squares method – residuals predicting: predicted value. The fitted line has two parameters: intercept and slope (b). Slope has a unit, value depends on the units of the axes.

eggs laid weight, kg y = 2,04x – 1,2

wool production, kg hours basked y = -0,195x + 7,1

Test following the path of ANOVA F=MSmodel/MSerror SStotal=SSmodel+SSerror, R2= SSmodel/SStotal model acconts ... % of variance. Two ways to express strength – slope andR2, p does not measure the strength of the relationship.

Presenting results “weight depended on length (b=..., R2= ....., df=....., F= ..., p<0.001)” equation: length = 3.78*temperature + 47.6 Standard error of slope Intercept zero – proportional, if x changes k times, then also y changes k. Regression is not symmetrical!

Assumptions of regression analysis are as follows: - residuals should be normally distributed; - variance of residuals must be independent on the values of x – otherwise heteroscedastic. - no other dependence on x; Distribution of x variable not important. Transformations – but do not forget when writing the equation. Regression through the origin.

Understanding Correlations and Linear Regression in Quantitative Analysis