330 likes | 433 Views
STAT131 Week 3 Lecture1b Correlation. Anne Porter Email alp@uow.edu.au Phone: 42214058. Statistical Research Process. To come. Exploring & Describing Data Tools for Looking at Variation in Data Structures. Questions thus far. What is the shape of the data set (ie what is its distribution?)
E N D
STAT131Week 3 Lecture1bCorrelation Anne Porter Email alp@uow.edu.au Phone: 42214058
To come Exploring & Describing DataTools for Looking at Variation in Data Structures
Questions thus far • What is the shape of the data set (ie what is its distribution?) • What is the centre of the data set? • What is the spread of the data set? • Are there any outliers (unusual data) in the data set? • Do we need to transform the data in some manner? • New Questions • Is there a relationship between cholesterol and • cardiovascular disease? • Is there a correlation between intelligence and • performance in exams?
Questions about relationship: How are the variables measured? • Is there a relationship between cholesterol and cardiovascular disease? • Is there a correlation between intelligence and performance in exams? For correlation the two variables are measured on a quantitative scale - either on interval or ratio scales
Height in Cm 3 2 1 0 1 2 3 4 5 6 Week Plot the height of a plant 1 to 6 weeks after planting Does one variable cause the other?
Height in Cm 3 2 1 0 1 2 3 4 5 6 Week Plot the height of a plant 1 to 6 weeks after planting Does one variable cause the other?
Causality Did time cause the plants to grow? • Relationships (Correlations) do not provide strong evidence of causality • Strong evidence of causality is provided through well designed experiments where there are different treatments No - Water, nutrients, sunshine, talking to the plants.. May all have caused the plants to grow.
x x x x x x x x x x Pearson’s Correlation r : Properties • Measures strength and direction of a straight line relationship 1 • Maximum value of r is All the points fall on a straight line, an increase in one score is matched by an increase in the other • Minimum value of r is -1 All the points fall on a straight line, an increase in one score is matched by an decrease in the other
Just looking at B might suggest no relationship Take care with the domain when measuring relationships Y A B C X
Pearson’s Correlation r : Properties No linear relationship • r=0 means r=0 does not mean there is no relationship just that it is not a linear relationship
Method 1: Calculating r where Sums of squares
An example: • Given the (x,y) pairs (0,0),(1,2),(2,4),(3,6),(4,4) • Is there a relationship between X and Y? • Step 1: Plot the data and identify... (1) if are any unusual data points (2) if the relationship appears to be linear (3) the approximate strength and direction of the relationship • Step 2: If there were outliers (1) look to see what happens if the outliers are removed. (2) look to see what happens if a transformation is used. • Step 3: Calculate r.
Y 6 5 4 3 2 1 X X X 0 1 2 3 4 X Step 1: Plot the relationship • Given the (x,y) pairs (0,0),(1,2),(2,4),(3,6),(4,4) • What is the nature of the relationship between X and Y? r is positive with Y increasing as X increases r is between 0 and 1, Linear with one outlier OR a curve in data - Artificial too few points! X X
Step 2: No outliers • Go on to calculate r • Using the formula below, what do we need to find?
Step 3: Calculate r 0 1 4 9 16 0 4 16 36 16 0 2 8 18 16 10 30 16 72 44
10 30 16 72 44 Step 3: Calculate r What else is needed to put into the formula?
10 30 16 72 44 Step 3: Calculate r What else is needed to put into the formula? n=
10 30 16 72 44 Step 3: Calculate r What else is needed to put into the formula? = (16)2=256 =160 = (10)2=100 n = 5
10 30 16 72 44 = (16)2=256 = (10)2=100 Step 3: Calculate r =160
r = 10 30 16 72 44 = (16)2=256 = (10)2=100 Step 3: Calculate r =160
r = r = = Step 3: Calculate r 12 12 =0.832 10 x 20.8 208
Method 2: SPSS : -correlation matrixr, n, p When you know how to work it out by calculator do it by SPSS. Do your assignment using SPSS.
Pearson’s r Number of (x,y) pairs Probability of getting r of that size or more, for sample size 5, if indeed there were no correlation in population Method 2: Reading outputr, n, p
Method 3: Calculating r where and
Method 4: Calculating r where
Questions about Relationships • Scatterplot • Approximate direction and strength • Linear or non linear • Outliers • Transformations if necessary to make linear • Calculate r to provide a measure • Comment as to the nature of the relationship found
Video Clip Decisions through Data Unit, Tape 3, Unit 13 Examines correlation as a measure of similarity. Decisions through Data, Tape 3, Unit 14 Examines correlation as a means of providing measure s for other things difficult to measure Decisions through Data, Tape 4, Unit 16, The question of causation
Fitting a line through the points on the scatterplot • Correlation provides a measure of strength of a relationship • When we want to describe the form of the relationship we determine the best equation for a line through the data points • Next lecture regression. • Finding the least squares regression line.