1 / 103

Correlation and regression

Correlation and regression. Today´s programme. Lecture Correlation Regression Exercise Group tasks on correlation and regression Free experiment supervision/help. Summary from last week. Last week we covered four types of non-parametric statistical tests

Download Presentation

Correlation and regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation and regression

  2. Today´s programme Lecture • Correlation • Regression Exercise • Group tasks on correlation and regression • Free experiment supervision/help

  3. Summary from last week • Last week we covered four types of non-parametric statistical tests • They make no assumptions about the data's characteristics. • Use if any of the three properties below are true: • (a) the data are not normally distributed (e.g. skewed); • (b) the data show in-homogeneity of variance; • (c) the data are measurements on an ordinalscale (can be ranked).

  4. Non-parametric tests • Non-parametric tests make few assumptions about the distribution of the data being analyzed • They get around this by not using raw scores, but by ranking them: The lowest score get rank 1, the next lowest rank 2, etc. • Different from test to test how ranking is carried out, but same principle • The analysis is carried out on the ranks, not the raw data • Ranking data means we lose information – we do not know the distance between the ranks • This means that non-par tests are less powerful than par tests, • and that non-par tests are less likely to discover an effect in our data than par tests (increased chance of type II error)

  5. Non-parametric tests Examples of parametric tests and their non-parametric equivalents: Parametric test:Non-parametric counterpart: • Pearson correlation Spearman's correlation • (No equivalent test) Chi-Square test • Independent-means t-test Mann-Whitney test • Dependent-means t-test Wilcoxon test • One-way Independent Measures Analysis of Variance (ANOVA) Kruskal-Wallis test • One-way Repeated-Measures ANOVA Friedman's test

  6. Non-parametric tests • Just like parametric tests, which non-parametric test to use depends on the experimental design (repeated measures or within groups), and the number of/level of IVs

  7. Summary: When to use which test? • Mann-Whitney: Two conditions, two groups, each participant one score • Wilcoxon: Two conditions, one group, each participant two scores (one per condition) • Kruskal-Wallis: 3+ conditions, different people in all conditions, each participant one score • Friedman´s ANOVA: 3+ conditions, one group, each participant 3+ scores

  8. Which non-parametric test? Which nonparametric test? Differences in fear ratings for 3, 5 and 7-year olds in response to sinister noises from under their bed Effects of cheese, brussel sprouts, wine and curry on vividness of a person's dreams Number of people spearing their eardrums after enforced listening to Britney Spears, Beyonce, Robbie Williams and Boyzone Pedestrians rate the aggressiveness of owners of different types of car. Group A rate Micra owners; group B rate 4x4 owners; group C rate Subaru owners; group D rate Mondeoowners. Consider: How many groups? How many levels of IV/conditions?

  9. Which non-parametric test? • Differences in fear ratings for 3, 5 and 7-year olds in response to sinister noises from under their bed [3 groups, each one score, 2 conditions - Kruskal-Wallis]. • Effects of cheese, brussel sprouts, wine and curry on vividness of a person's dreams [one group, each 4 scores, 4 conditions - Friedman´s ANOVA]. • Number of people spearing their eardrums after enforced listening to Britney Spears, Beyonce, Robbie Williams and Boyzone [one group, each 4 scores, 4 conditions – Friedman´s ANOVA] • Pedestrians rate the aggressiveness of owners of different types of car. Group A rate Micra owners; group B rate 4x4 owners; group C rate Subaru owners; group D rate Mondeo owners. [4 groups, each one score – Kruskal-Wallis]

  10. Correlation

  11. Correlation • We often want to know if there is a relationship between two variables • Do people who drive fast cars get into accidents more often? • Do students who give the teacher red apples get higher grades? • Do blondes have more fun? • Etc.

  12. Correlation • Correlation coefficient: A succinct measure of the strength of the relationship between two variables (e.g. height and weight, age and reaction time, IQ and exam score).

  13. Correlation • A correlation is a measure of the linear relationship between variables • Two variables can be related in different ways: • 1) positively related: The faster the car, the more accidents • 2) not related: Speed of the car does not matter on the amount of accidents • 3) negatively related: The faster the car, the less accidents

  14. Correlation • We describe the relationship between variables statistically by looking at two measures: • Covariance • Correlation coefficient • We represent relationships graphically using scatterplots • The simplest way to decide if two variables are associated is to evaluate if they covary • Recall: Variance of one variable is the average amount the scores in the sample vary from the mean – if variance is high, the scores in the sample are very different from the mean

  15. Correlation • Low and high variance around the mean of a sample

  16. Correlation • If we want to know whether two variables are related, we want to know if changes in the scores of one variable is met with similar changes in the other variable • Therefore, when one variable deviates from its mean, we would expect the SCORES of the other variable to deviate from its mean in a similar way • Example: We take 5 ppl, show them a commercial and measure how many packets of sweets they buy the week after • If the number of times a commercial was seen relates to how many packets of sweets that were bought relates, the scores should vary around the mean of the two samples in a similar way

  17. Results of commercial on sweets-buying Looks like A relationship exists

  18. Correlation • How do we calculate the exact similarity between the pattern of difference in the two variables (samples)? • We calculate covariance • Step 1: multiply the difference between the scores and the mean in the two samples • Note that if the difference between the means and the two scores are both positive or both negative, we get a positive value (+ * + = + and - * - = +) • If the difference between the means and the two scores is negative and positive, we get a negative value (+ * - = -)

  19. Correlation • Step 2: Divide with the sum of observations (scores) -1: N-1 • Same equation as for calculating variance • Except that we multiply differences with the corresponding difference of the score in the other sample, rather than squaring the differences within one sample

  20. Correlation • Positive covariance indicate that as one variable deviates from the mean, so does the other in the same direction. • Faster cars lead to more accidents • Negative covariance indicate that as one variables deviates from the mean, so does the other but in the opposite direction • Faster cars lead to less accidents

  21. Correlation • Covariance however depend on the scale of measurement used – it is not an independent measure • To overcome this problem, we standardizethe covariance – so covariance is comparable across all experiments, no matter what type of measure we use

  22. Correlation • We do this by convertingdifferences between scores and means intostandard deviations • Recall: Any score can be expressed in terms of how many SD´s it is away from the mean (the z-score) • We therefore divide covariance with the SDs of both samples • Two samples, we need the SD from both of them to standardize the covariance

  23. Correlation • This standardized covariance is known as the correlation coefficient • This is also called Pearsons correlation coefficient and is one of the most important formulas in statistics

  24. Correlation • When we standardize covariance we end up with a value that lies between -1 and +1 • If r = +1 , we have a perfect positive relationship

  25. +1 (perfect positive correlation: as X increases, so does Y):

  26. Correlation • When we standardize covariance we end up with a value that lies between -1 and +1 • If r = -1 , we have a perfect negative relationship

  27. Perfect negative correlation: As X increases, Y decreases, or vice versa

  28. Correlation • If r = 0 there is no correlation between the two samples – changes in sample X are not associated with systematic changes in sample Y, or vice versa. • Recall that we can use correlation coefficient as a measure of effect size • An r of +/- 0.1 is a small effect, 0.3 medium effect and 0.5 large effect

  29. Scatterplots

  30. Scatterplots • Before performing correlational analysis we plot a scatterplot to get an idea about how the variables covary • A scatterplot is a graph of the scores of one sample (variable) vs. the scores of another sample • Further variables can be included in a 3D plot.

  31. A scatterplot informs if: • There is a relationships between the variables • What kind of relationship it is • If any cases (scores) are markedly different – outliers – these cause problems • We normally plot the IV on the x-axis, and the DV on the y-axis

  32. Scatterplots • A 2D scatterplot

  33. Scatterplots • A 3D scatterplot

  34. Using SPSS to obtain scatterplots: (a) simple scatterplot: Graphs > Legacy Dialogs > Scatter/Dot...

  35. 3. Drag X and Y variables into x-axis and y-axis boxes in chart preview window 2. Drag "Simple scatter" icon into chart preview window. 1. Pick Scatterdot Using SPSS to obtain scatterplots: (a) simple scatterplot: Graphs > Chartbuilder

  36. Using SPSS to obtain scatterplots: (b) scatterplot with regression line: Analyze > Regression > Curve Estimation... ”constant" is the intercept with y-axis, "b1" is the slope”

  37. Correlation II • Having visually looked at the data, we can conduct a correlation analysis in SPSS • Procedure in page 123 in chapter 4 of Field´s book in the compendium • Note: Two types of correlation: Bivariate and partial • Bivariate is correlation between two variables • Partial correlation is the same, but controlling for one or more additional variables

  38. Using SPSS to obtain correlations: Analyze > Correlate > Bivariate...

  39. Correlations II • There are various types of correlation coefficient, for different purposes: • Pearson's "r": Used when both X and Y variables are (a) continuous; (b) (ideally) measurements on interval or ratio scales; (c) normallydistributed - e.g. height, weight, I.Q. • Spearman's rho: In same circumstances as (1), except that data need only be on an ordinal scale - e.g. attitudes, personality scores.

  40. Correlations II • r is a parametric test: the data have to have certain characteristics (parameters) before it can be used. • rho is a non-parametric test - less fussy about the nature of the data on which it is performed. • Both are dead easy to calculate in SPSS

  41. Pearsons r

  42. Pearsons r • Calculating Pearson's r: a worked example: Is there a relationship between the number of parties a person gives each month, and the amount of flour they purchase from Møller-Mogens?

  43. Pearsons r • Our algorithm for the correlation coefficient from before, slightly modified:

  44. ( ) ( ) 382 776 Using our values (from the bottom row of the table:) * - 29832 10 = r ( ) ( ) æ ö æ ö 2 2 382 776 ç ÷ ç ÷ - * - 14876 60444 ç ÷ ç ÷ è ø è ø 10 10

  45. - 29832 29643 . 20 = r ( ) ( ) - * - 14876 14592 . 40 60444 60217 . 60 188 . 80 188 . 80 = = = r 0 . 7455 * 253 . 391 283 . 60 226 . 40 r is 0.75. This is a positive correlation: People who buy a lot of flour from Møller-Mogens also hold a lot of parties (and vice versa).

  46. Pearsons r How to interpret the size of a correlation: • r2 (r * r, “r-square”) is the "coefficient of determination". It tells us what proportion of the variation in the Y scores is associated with changes in X. • E.g., if r is 0.2, r2 is 4% (0.2 * 0.2 = 0.04 = 4%). • Only 4% of the variation in Y scores is attributable to Y's relationship with X. • Thus, knowing a person's Y score tells you essentially nothing about what their X score might be!

  47. Pearsons r • Our correlation of 0.75 gives an r2of 56%. •  An r of 0.9, gives an r2 of (0.9 * 0.9 = .81) = 81%. • Note that correlations become much stronger the closer they are to 1 (or -1). • Correlations of .6 or -.6 (r2 = 36%) are much better than correlations of .3 or -.3 (r2 = 9%), not merely twice as strong!

  48. Spearmans rho

  49. Spearman´s rho • We use Spearman´s correlation coefficient when the data hev violated parametric assumptions (e.g. non-normal distribution) • Spearman´s correlation coefficient works with ranking the data in the samples just like other non-parametric tests

More Related