340 likes | 612 Views
Exploring Relationships: Correlations & Multiple Linear Regression. Developing Study Skills and Research Methods (HL20107). Dr James Betts. Lecture Outline:. Correlation Coefficients Coefficients of Determinations Prediction & Regression Multiple Linear Regression Assessment Details.
E N D
Exploring Relationships: Correlations & Multiple Linear Regression Developing Study Skills and Research Methods (HL20107) Dr James Betts
Lecture Outline: • Correlation Coefficients • Coefficients of Determinations • Prediction & Regression • Multiple Linear Regression • Assessment Details.
Statistics Descriptive Inferential Correlational Organising, summarising & describing data Generalising Relationships Significance
A measure of the relationship (correlation) between interval/ratio LOM variables taken from the same set of subjects A ratio which indicates the amount of concomitant variation between two sets of scores This ratio is expressed as a correlation coefficient (r): Correlation -0.7 -0.3 -0.1 -1 0 +0.1 +0.3 +1 +0.7 Perfect Negative Relationship Perfect Positive Relationship Strong Strong Moderate Weak Weak Moderate _ + No Relationship
Correlation Coefficient & Scatterplots Direction Variable Y (e.g. Exercise Capacity) Variable Y (e.g. 10 km run time) . . Variable X (e.g. VO2max) Variable X (e.g. VO2max)
Correlation Coefficient & Scatterplots Form Variable Y (e.g. Exercise Capacity) Variable Y (e.g. Strength) . Variable X (e.g. VO2max) Variable X (e.g. Age)
Correlation Coefficient & Scatterplots Significance Variable Y (e.g. Exercise Capacity) Variable Y (e.g. 100 m Sprint time) . . Variable X (e.g. VO2max) Variable X (e.g. VO2max)
Correlation Coefficient & Scatterplots Significance Variable Y (e.g. 100 m sprint time) Variable Y (e.g. Exercise Capacity) . . Variable X (e.g. VO2max) Variable X (e.g. VO2max)
Any method of calculating r requires: Homoscedacity (i.e. equal scattering) Linear data (curvilinear data requires eta η) Parametric data (i.e. raw data >ordinal LOM and either normal distribution or large sample) permits the use of ‘Pearson’s Product-Moment Correlation’ If raw data violates these assumptions then use ‘Spearman’s Rank Order Correlation’ instead. Methods of Calculating r
Pearson’s Product-Moment Correlation nXY-(X)(Y) r = [nX2-(X)2] [nY2-(Y)2
Spearman’s Rank-Order Correlation 6D2 r = 1 - n(n2-1)
Coefficient of Determination (r2x 100) • AKA ‘variance explained’, this figure denotes how much of the variance in Y can be explained/predicted by X e.g. to predict long jump distance (Y) from maximum sprint speed (X) r = 0.8 r2 = 64% Y X
Correlation versus Regression • By attempting to predict one variable using another, we are now moving away from simple correlation and moving into the concept of regression Correlation = Regression =
Linear Regression • The equation for a linear relationship can be expressed as: Y= a + bX -where: a= the y intercept; and b = the gradient Variable Y (e.g. Exercise Capacity) . Variable X (e.g. VO2max)
Extrapolation versus Interpolation Remember that the accuracy of your equation depends upon the linear relationship you observed ? Interpolation = Variable Y (e.g. Exercise Capacity) Extrapolation = . Variable X (e.g. VO2max)
Multiple Linear Regression • We saw earlier how maximum sprint speed (X) can predict/explain 64% of variance in long jump distance (Y) Y X r2 = 64% …but can Y be predicted any more effectively using more than one independent variable (i.e. X1, X2 , X3, etc)?
Multiple Linear Regression • However, we can often predict Y effectively just using a specific subset of X variables (i.e. a reduced model) Y X1 X2 Event Experience
Multiple Linear Regression • ‘Best Subset Selection Methods’ involve calculation of r for every possible combination of IVs • Stepwise regression methods involve gradually either adding or removing variables and monitoring the impact of each action on r. • Standard methods add and remove variables • Forward selection methods begin with 1 IV and add more • Backwards elimination methods begin with all IVs and remove • The order in which IVs are added/removed is critical as the variance explained solely by any one will be entirely dependent upon the presence of others.
Summary: Exploring Relationships • The relationship between two variables can be expressed as a correlation coefficient (r) • The coefficient of determination (r2) denotes the % of one variable that is explained by another • Linear regression can provide an equation with which to predict one variable from another • Multiple linear regression can potentially improve this prediction using multiple predictor variables.
Your coursework will require you to address 2out of 3 research scenarios that are available on the unit webpage Coursework Project (40 % overall grade) • For each of the 2 scenarios you will need to: • Perform a literature search in order to provide a comprehensive introduction to the research area • Identify the variables of interest and evaluate the research design which was adopted • Formulate and state appropriate hypotheses…
Cont’d… Summarise descriptive statistics in an appropriate and well presented manner Select the most appropriate statistical test with justification for your decision Transfer the output of your inferential statistics into your word document Interpret your results and discuss the validity and reliability of the study Draw a meaningful conclusion (state whether hypotheses are accepted or rejected).
2000 words maximum (i.e. 1000 for each) Any supporting SPSS data/outputs to be appended To be submitted on Thursday 6th May Assessment Weighting Evaluation & Analysis (30 %) Reading & Research (20 %) Communication & Presentation (20 %) Knowledge (30 %) Coursework Details (see unit outline)
All information relating to your coursework (including the relevant data files) are accessible via the module web page: http://people.bath.ac.uk/jb335/Y2%20Research%20Skills%20(FH200107).html Web address also referenced on shared area Electronic copy to be included with submission. Coursework Details Any further questions/problems can be raised in the CW revision lecture/labs after Easter
This test will involve analysis/interpretation of the resultant data assessed via short answer questions Practice session Wednesday 14th April Duration = 80 min (2 groups) I will Email specific details after Easter. Timed Practical Computing Exercise (20 % overall grade)
Start Here Looking for differences between categories/frequencies? (i.e. nominal LOM) Looking for differences within the same group of subjects? (i.e. paired data) Looking for differences between 2 separate groups of subjects? (i.e. unpaired data) Looking for relationships? Looking for differences with >1 independent variable? 2 observations 2 groups 2 variables Both IVs paired >2 observations >2 groups >2 variables both IVs unpaired 1 observed frequency >1 observed frequency Goodness of Fit χ2 Contingency χ2 Paired t-test Independent t-test Pearson’s r 2-way paired ANOVA 1-way paired ANOVA 1-way unpaired ANOVA Multiple Linear Regression 2-way unpaired ANOVA 1 IV paired 1 IV unpaired Wilcoxon test Mann-Whitney test Spearman’s r Friedman’s test Kruskal Wallis test non-parametric 2-way mixed model ANOVA If multiple DVs are involved then use MANOVA Post-Hoc Tests