640 likes | 909 Views
Computing in Archaeology. Session 11. Correlation and regression analysis. © Richard Haddlesey www.medievalarchitecture.net. Lecture aims. To introduce correlation and regression techniques. The scattergram.
E N D
Computing in Archaeology Session 11. Correlation and regression analysis © Richard Haddlesey www.medievalarchitecture.net
Lecture aims • To introduce correlation and regression techniques
The scattergram • In correlation, we are always dealing with paired scores, and so values of the two variables taken together will be used to make a scattergram
example • Quantities of New Forrest pottery recovered from sites at varying distances from the kilns
Negative correlation Here we can see that the quantity of pottery decreases as distance from the source increases
Positive correlation Here we see that the taller a pot, the wider the rim
Curvilinear monotonic relation Again the further from source, the less quantity of artefacts
Arched relationship (non-monotonic) Here we see the first molar increases with age and is then worn down as the animal gets older
scattergram • This shows us that scattergrams are the most important means of studying relationships between two variables
REGRESSION • Regression differs from other techniques we have looked at so far in that it is concerned not just with whether or not a relationship exists, or the strength of that relationship, but with its nature • In regression analysis we use an independent variable to estimate (or predict) the values of a dependent variable
Regression equation y = f(x) • y = y axis (in this case the dependent • f = function (of x) • x = x axis
y = f(x) y = x y = 2x y = x2
General linear equations • y = a + bx • Where y is the dependent variable, x is the independent variable, and the coefficients a and b are constants, i.e. they are fixed for a given data
Therefore: • If x = 0 then the equation reduces to y = a, so a represents the point where the regression line crosses the y axis (the intercept) • The b constant defines the slope of gradient of the regression line • Thus for the pottery quantity in relation to distance from source, b represents the amount of decrease in pottery quantity from the source
CORRELATION 1 correlation coefficient
CORRELATION 1 correlation coefficient 2 significance
CORRELATION • 1 correlation coefficient • r • 2 significance
CORRELATION • 1 correlation coefficient • r • -1 to +1 • 2 significance
Levels of measurement: • nominal – in name only • ordinal – forming a sequence • interval – a sequence with fixed distances • ratio – fixed distances with a datum point
Levels of measurement: • nominal • ordinal • interval • ratio
Levels of measurement: • nominal • ordinal • interval Product-Moment • Correlation Coefficient • ratio
Levels of measurement: • nominal • ordinal Spearman’s Rank • Correlation Coefficient • interval • ratio
The Product-Moment Correlation Coefficient
sample – 20 bronze spearheads length (cm) width (cm) n=20
r = nΣxy – (Σx)(Σy) g √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2] length (cm) width (cm) n=20
r = nΣxy – (Σx)(Σy) g √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2] n=20
r = nΣxy – (Σx)(Σy) g √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2] n=20
r = nΣxy – (Σx)(Σy) g= +0.67 √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2] n=20
Test of product moment correlation coefficient H0 : true correlation coefficient = 0
Test of product moment correlation coefficient H0 : true correlation coefficient = 0 H1 : true correlation coefficient ≠ 0
Test of product moment correlation coefficient H0 : true correlation coefficient = 0 H1 : true correlation coefficient ≠ 0 Assumptions: both variables approximately random
Test of product moment correlation coefficient H0 : true correlation coefficient = 0 H1 : true correlation coefficient ≠ 0 Assumptions: both variables approximately random Sample statistics needed: n and r