1.04k likes | 1.36k Views
Correlation and regression. Today´s programme. Lecture Correlation Regression Exercise Group tasks on correlation and regression Free experiment supervision/help. Summary from last week. Last week we covered four types of non-parametric statistical tests
E N D
Today´s programme Lecture • Correlation • Regression Exercise • Group tasks on correlation and regression • Free experiment supervision/help
Summary from last week • Last week we covered four types of non-parametric statistical tests • They make no assumptions about the data's characteristics. • Use if any of the three properties below are true: • (a) the data are not normally distributed (e.g. skewed); • (b) the data show in-homogeneity of variance; • (c) the data are measurements on an ordinalscale (can be ranked).
Non-parametric tests • Non-parametric tests make few assumptions about the distribution of the data being analyzed • They get around this by not using raw scores, but by ranking them: The lowest score get rank 1, the next lowest rank 2, etc. • Different from test to test how ranking is carried out, but same principle • The analysis is carried out on the ranks, not the raw data • Ranking data means we lose information – we do not know the distance between the ranks • This means that non-par tests are less powerful than par tests, • and that non-par tests are less likely to discover an effect in our data than par tests (increased chance of type II error)
Non-parametric tests Examples of parametric tests and their non-parametric equivalents: Parametric test:Non-parametric counterpart: • Pearson correlation Spearman's correlation • (No equivalent test) Chi-Square test • Independent-means t-test Mann-Whitney test • Dependent-means t-test Wilcoxon test • One-way Independent Measures Analysis of Variance (ANOVA) Kruskal-Wallis test • One-way Repeated-Measures ANOVA Friedman's test
Non-parametric tests • Just like parametric tests, which non-parametric test to use depends on the experimental design (repeated measures or within groups), and the number of/level of IVs
Summary: When to use which test? • Mann-Whitney: Two conditions, two groups, each participant one score • Wilcoxon: Two conditions, one group, each participant two scores (one per condition) • Kruskal-Wallis: 3+ conditions, different people in all conditions, each participant one score • Friedman´s ANOVA: 3+ conditions, one group, each participant 3+ scores
Which non-parametric test? Which nonparametric test? Differences in fear ratings for 3, 5 and 7-year olds in response to sinister noises from under their bed Effects of cheese, brussel sprouts, wine and curry on vividness of a person's dreams Number of people spearing their eardrums after enforced listening to Britney Spears, Beyonce, Robbie Williams and Boyzone Pedestrians rate the aggressiveness of owners of different types of car. Group A rate Micra owners; group B rate 4x4 owners; group C rate Subaru owners; group D rate Mondeoowners. Consider: How many groups? How many levels of IV/conditions?
Which non-parametric test? • Differences in fear ratings for 3, 5 and 7-year olds in response to sinister noises from under their bed [3 groups, each one score, 2 conditions - Kruskal-Wallis]. • Effects of cheese, brussel sprouts, wine and curry on vividness of a person's dreams [one group, each 4 scores, 4 conditions - Friedman´s ANOVA]. • Number of people spearing their eardrums after enforced listening to Britney Spears, Beyonce, Robbie Williams and Boyzone [one group, each 4 scores, 4 conditions – Friedman´s ANOVA] • Pedestrians rate the aggressiveness of owners of different types of car. Group A rate Micra owners; group B rate 4x4 owners; group C rate Subaru owners; group D rate Mondeo owners. [4 groups, each one score – Kruskal-Wallis]
Correlation • We often want to know if there is a relationship between two variables • Do people who drive fast cars get into accidents more often? • Do students who give the teacher red apples get higher grades? • Do blondes have more fun? • Etc.
Correlation • Correlation coefficient: A succinct measure of the strength of the relationship between two variables (e.g. height and weight, age and reaction time, IQ and exam score).
Correlation • A correlation is a measure of the linear relationship between variables • Two variables can be related in different ways: • 1) positively related: The faster the car, the more accidents • 2) not related: Speed of the car does not matter on the amount of accidents • 3) negatively related: The faster the car, the less accidents
Correlation • We describe the relationship between variables statistically by looking at two measures: • Covariance • Correlation coefficient • We represent relationships graphically using scatterplots • The simplest way to decide if two variables are associated is to evaluate if they covary • Recall: Variance of one variable is the average amount the scores in the sample vary from the mean – if variance is high, the scores in the sample are very different from the mean
Correlation • Low and high variance around the mean of a sample
Correlation • If we want to know whether two variables are related, we want to know if changes in the scores of one variable is met with similar changes in the other variable • Therefore, when one variable deviates from its mean, we would expect the SCORES of the other variable to deviate from its mean in a similar way • Example: We take 5 ppl, show them a commercial and measure how many packets of sweets they buy the week after • If the number of times a commercial was seen relates to how many packets of sweets that were bought relates, the scores should vary around the mean of the two samples in a similar way
Results of commercial on sweets-buying Looks like A relationship exists
Correlation • How do we calculate the exact similarity between the pattern of difference in the two variables (samples)? • We calculate covariance • Step 1: multiply the difference between the scores and the mean in the two samples • Note that if the difference between the means and the two scores are both positive or both negative, we get a positive value (+ * + = + and - * - = +) • If the difference between the means and the two scores is negative and positive, we get a negative value (+ * - = -)
Correlation • Step 2: Divide with the sum of observations (scores) -1: N-1 • Same equation as for calculating variance • Except that we multiply differences with the corresponding difference of the score in the other sample, rather than squaring the differences within one sample
Correlation • Positive covariance indicate that as one variable deviates from the mean, so does the other in the same direction. • Faster cars lead to more accidents • Negative covariance indicate that as one variables deviates from the mean, so does the other but in the opposite direction • Faster cars lead to less accidents
Correlation • Covariance however depend on the scale of measurement used – it is not an independent measure • To overcome this problem, we standardizethe covariance – so covariance is comparable across all experiments, no matter what type of measure we use
Correlation • We do this by convertingdifferences between scores and means intostandard deviations • Recall: Any score can be expressed in terms of how many SD´s it is away from the mean (the z-score) • We therefore divide covariance with the SDs of both samples • Two samples, we need the SD from both of them to standardize the covariance
Correlation • This standardized covariance is known as the correlation coefficient • This is also called Pearsons correlation coefficient and is one of the most important formulas in statistics
Correlation • When we standardize covariance we end up with a value that lies between -1 and +1 • If r = +1 , we have a perfect positive relationship
+1 (perfect positive correlation: as X increases, so does Y):
Correlation • When we standardize covariance we end up with a value that lies between -1 and +1 • If r = -1 , we have a perfect negative relationship
Perfect negative correlation: As X increases, Y decreases, or vice versa
Correlation • If r = 0 there is no correlation between the two samples – changes in sample X are not associated with systematic changes in sample Y, or vice versa. • Recall that we can use correlation coefficient as a measure of effect size • An r of +/- 0.1 is a small effect, 0.3 medium effect and 0.5 large effect
Scatterplots • Before performing correlational analysis we plot a scatterplot to get an idea about how the variables covary • A scatterplot is a graph of the scores of one sample (variable) vs. the scores of another sample • Further variables can be included in a 3D plot.
A scatterplot informs if: • There is a relationships between the variables • What kind of relationship it is • If any cases (scores) are markedly different – outliers – these cause problems • We normally plot the IV on the x-axis, and the DV on the y-axis
Scatterplots • A 2D scatterplot
Scatterplots • A 3D scatterplot
Using SPSS to obtain scatterplots: (a) simple scatterplot: Graphs > Legacy Dialogs > Scatter/Dot...
3. Drag X and Y variables into x-axis and y-axis boxes in chart preview window 2. Drag "Simple scatter" icon into chart preview window. 1. Pick Scatterdot Using SPSS to obtain scatterplots: (a) simple scatterplot: Graphs > Chartbuilder
Using SPSS to obtain scatterplots: (b) scatterplot with regression line: Analyze > Regression > Curve Estimation... ”constant" is the intercept with y-axis, "b1" is the slope”
Correlation II • Having visually looked at the data, we can conduct a correlation analysis in SPSS • Procedure in page 123 in chapter 4 of Field´s book in the compendium • Note: Two types of correlation: Bivariate and partial • Bivariate is correlation between two variables • Partial correlation is the same, but controlling for one or more additional variables
Using SPSS to obtain correlations: Analyze > Correlate > Bivariate...
Correlations II • There are various types of correlation coefficient, for different purposes: • Pearson's "r": Used when both X and Y variables are (a) continuous; (b) (ideally) measurements on interval or ratio scales; (c) normallydistributed - e.g. height, weight, I.Q. • Spearman's rho: In same circumstances as (1), except that data need only be on an ordinal scale - e.g. attitudes, personality scores.
Correlations II • r is a parametric test: the data have to have certain characteristics (parameters) before it can be used. • rho is a non-parametric test - less fussy about the nature of the data on which it is performed. • Both are dead easy to calculate in SPSS
Pearsons r • Calculating Pearson's r: a worked example: Is there a relationship between the number of parties a person gives each month, and the amount of flour they purchase from Møller-Mogens?
Pearsons r • Our algorithm for the correlation coefficient from before, slightly modified:
( ) ( ) 382 776 Using our values (from the bottom row of the table:) * - 29832 10 = r ( ) ( ) æ ö æ ö 2 2 382 776 ç ÷ ç ÷ - * - 14876 60444 ç ÷ ç ÷ è ø è ø 10 10
- 29832 29643 . 20 = r ( ) ( ) - * - 14876 14592 . 40 60444 60217 . 60 188 . 80 188 . 80 = = = r 0 . 7455 * 253 . 391 283 . 60 226 . 40 r is 0.75. This is a positive correlation: People who buy a lot of flour from Møller-Mogens also hold a lot of parties (and vice versa).
Pearsons r How to interpret the size of a correlation: • r2 (r * r, “r-square”) is the "coefficient of determination". It tells us what proportion of the variation in the Y scores is associated with changes in X. • E.g., if r is 0.2, r2 is 4% (0.2 * 0.2 = 0.04 = 4%). • Only 4% of the variation in Y scores is attributable to Y's relationship with X. • Thus, knowing a person's Y score tells you essentially nothing about what their X score might be!
Pearsons r • Our correlation of 0.75 gives an r2of 56%. • An r of 0.9, gives an r2 of (0.9 * 0.9 = .81) = 81%. • Note that correlations become much stronger the closer they are to 1 (or -1). • Correlations of .6 or -.6 (r2 = 36%) are much better than correlations of .3 or -.3 (r2 = 9%), not merely twice as strong!
Spearman´s rho • We use Spearman´s correlation coefficient when the data hev violated parametric assumptions (e.g. non-normal distribution) • Spearman´s correlation coefficient works with ranking the data in the samples just like other non-parametric tests