PSYM021 Introduction to Methods Statistics

1. PSYM021 Introduction to Methods & Statistics Week Five: Statistical techniques III

2. Web support Simple regression � a reminder Multiple regression � an introduction Reporting regression analyses Choosing regressors (predictor variables) Choosing a regression model Model checking - residuals

3. Establish equation for the best-fit line: y = bx + a Simple Regression

4. Establish equation for the best-fit line: y = b1x1 + b2x2 + b3x3 + a

5. For multiple regression, R2 will get larger every time another independent variable (regressor/predictor) is added to the model Add �work stress� to model ? New regressor may only provide a tiny improvement in amount of variance in the data explained by the model Need to establish the �added value� of each additional regressor in predicting the DV

6. Takes into account the number of regressors in the model Calculated as: R2adj = 1 - (1-R2)(N-1)/(N-n-1) where: N = number of data points n = number of regressors You don�t need to memorise this equation, but� Note that R2adj will always be smaller than R2

7. �Effectiveness� vs �Efficiency� Effectiveness: maximises R2 ie: maximises proportion of variance explained by model Efficiency: maximises increase in R2adj upon adding another regressor ie: if new regressor doesn�t add much to the variance explained, it is not worth adding

8. Effectiveness (R2 and R2adj) 0 - 25% very poor and likely to be unacceptable 25 - 50% poor, but may be acceptable 50 - 75% good 75 - 90% very good 90% + likely that there is something wrong with your analysis

9. Analysis of Variance test checks to see if model, as a whole, has a significant relationship with the DV Part of the predictive �value� of each regressor may be shared by one or more of the other regressors in the model, so the model must be considered as a whole (i.e. all regressors/IVs together) Read off ANOVA table in SPSS output, and report as you did in week 3/4 assignments

10. SPSS output table entitled Coefficients Column headed Unstandardised coefficients - B Gives regression coefficient for each regressor variable (IV) �With all the other variables held constant� Units of coefficient are same as those for regressor (IV)

11. Units of coefficient are same as those for variable eg: dependent variable ? score on video game (in points) regressor ? time of day (in hours) B coefficient for time = 844.57 score = (B coefficient x time) + constant score = (844.57 time) � 4239.6 This means that for every increase of one hour in the variable time, we would predict that a person�s score will increase by 844.57 points

12. dependent variable ? score on video game regressor ? gender Gender coded so that: 1 = male, 2 = female Let B coefficient for gender = 100.00 So, ? score = 100.00 gender + constant Adding �1� to the variable gender means that we go from male to female This means that females would be expected to score 100.00 points more than males Remember that the B coefficient is calculated on the basis that 1=male and 2=female (different coding will give a different coefficient)

13. Units for each regression coefficient are different, so we must standardise them if we want to compare one with another Column headed Standardised coeficients - Beta Can compare the Beta weights for each regressor variable to compare effects of each on the dependent variable Larger Beta weight indicates stronger effect of regressor on values of DV

14. Assessed using a t-test Check values in column headed t and sig If regression coefficient is negative, then t-value will also be negative (it does not matter about the sign, it is the size of t that is important)

15. How should I report a regression analysis?

16. Describe the characteristics of the model before you describe the significance of the relationship So: 1. R2, R2adj - how well does the model fit the data? 2. Fm,n - is the relationship significant? 3. Regression equation - how to calculate values of DV from known values of IVs? 4. Describe results in plain English

17. We want to predict IQ score using brain size (MRI), height and gender as regressors

20. Regression equation: y = b1x1 + b2x2 + b3x3 + b4x4 + a IQ = 1.824x10-4 MRI � 0.316 height + 2.426 gender + (-6.411) = 0.0001824 MRI � 0.316 height + 2.426 gender + (-6.411) = 0.0002 MRI � 0.316 height + 2.426 gender + (-6.411)

21. �The regression was a poor fit, describing only 21.7% of the variance in IQ (R2adj= 14.6%), but the overall relationship was statistically significant (F3,33= 3.05, p<0.05).� �With other variables held constant, IQ scores were negatively related to height, decreasing by 0.32 IQ points for every extra centimetre in height, and positively related to brain size, increasing by 0.0002 IQ points for every extra pixel of the scan. Women tended to have higher scores than men, by 2.43 IQ points. However, the effect of brain size (MRI) was the only significant effect (t33=2.75, p=0.01)�

22. Break Five minutes � please be back promptly

23. What do we want of a regressor? To have �a significant effect� on the dependent variable Ability to �discriminate� between values of the dependent variable

24. Dichotomous variable (eg: gender) Compare using t-test If significant, then possible regressor predicts differences in dependent variable

25. Continuous variable (eg: Height) Compare using correlation If significant, then possible regressor predicts differences in dependent variable

26. Some of �discriminatory value� in regressor may be accounted for by regressors present in model already gender, income, height age, experience, value of property �In the presence of all regressors� Adding regressor may not add as much to model�s predictive value as you might have anticipated

27. Same number of regressors Choose model with highest value of R2adj This gives �best value� per regressor Will also have the highest value of R2 and F Different number of regressors Highest value of R2adj (more regressors) Highest value of F (fewer regressors)

28. Effective: highest R2 (�most complete�) will have more regressors will be effective, but not efficient Efficient: highest F-ratio (�most significant�) will have fewer regressors will be efficient, but not particularly effective Compromise: largest increase in R2adj (best of both worlds) will contain only the �best� regressors available manageable number of regressors and reasonably effective

29. Tries every possible combination of available regressors (up to maximum of 20) eg: 20 regressors give over 1,000,000 different models Command: Dependent variable is in column 10 Independent variables in columns 1 to 6 BREG C10 C1-C6 Will not be required to carry out this type of analysis in exam, but you need to be able to interpret output

30. MTB > BREG C13 C1-C12 Best Subsets Regression Response is prodebt 304 cases used 160 cases contain missing values. i c c l n h s b b c x o c i i a s m a c m c o h l n a n o a r i a i m o d g g k c n d g s n e u r p e a a a u b b t Adj. g s e a g c c g s u u r Vars R-Sq R-Sq C-p s p e n r p c c e e y y n 7 19.3 17.4 7.3 0.65539 X X X X X X X 7 19.1 17.2 7.8 0.65602 X X X X X X X 8 19.9 17.7 6.9 0.65388 X X X X X X X X 8 19.5 17.4 8.2 0.65536 X X X X X X X X 9 20.2 17.8 7.8 0.65375 X X X X X X X X X 9 20.1 17.6 8.3 0.65434 X X X X X X X X X 10 20.4 17.6 9.3 0.65427 X X X X X X X X X X

31. Best two models for each possible number of regressors are displayed in output Compare R2adj values directly Select best model(s) Run normal regression in SPSS for each selected model Compare F-ratio values

32. Identify best subset of regressors from BREG output Must run ordinary regression procedure calculates F-ratio calculates individual coefficients and significance Highest R2adj values result in significant F-ratios if F-ratio not significant, check data and procedure BUT: Advisable to try two or three models, as the number of respondents contributing to each analysis may not be the same between Minitab and SPSS

33. Choose procedure by selecting appropriate tab in drop-down menu �Enter� procedure: Adds all regressors to model simultaneously Calculates F-ratio and R2adj for all regressors �Stepwise� procedure: Adds regressors one at a time Calculates F-ratio and R2adj for each set of regressors considers taking regressors out at each stage

34. Frequently have values missing from data set missed out questions couldn�t understand question couldn�t collect data for some reason Must specify missing values in SPSS in �Define Variable� window Differences in R2adj or F-ratio values are most likely to be due to missing values Leads to different �n� in each analysis

35. Residuals (general) Unusual observations � �outliers�

36. Predicted value for �y� (dependent variable) y = b1x1 + b2x2 + � + a Actual (observed) value for �y� Actual (observed) value minus predicted (calculated) value

38. Residuals should be: Normally distributed some big, some small, most average-sized Independent of one another no constant covariation with one another almost identical in terms of variance regardless of the values of the IVs or DVs

39. Outliers Linear regression would work quite well for this data, except for the presence of three outlier points

40. Dealing with outliers Run regression analysis Plot data on a scattergram Remove outliers by deleting the rows in SPSS Run regression analysis again Note any qualitative differences: if there are qualitative differences, then check data. If no errors, report both analyses if only quantitative differences, then leave outliers in analysis, noting their presence

41. Removing outliers Plotting data may indicate that some participants belong to a separate sub-sample. Eg: people with an exam phobia?

42. DV vs IV Differences between actual and predicted values (ie: residual values) should show a normal distribution) Some large positive

43. DV vs IV If our best-fit line does not fit too well, this will be revealed in the distribution of the Residuals

44. Final assignment due in Friday midday Next week: Alex Haslam�s �Uncertainty Management� Thank you and goodnight !

PSYM021 Introduction to Methods Statistics

PSYM021 Introduction to Methods Statistics

Presentation Transcript

PSYM021 Introduction to Methods Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

INTRODUCTION TO STATISTICS

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

Introduction to Statistics

INTRODUCTION TO STATISTICS

Introduction to Statistics

Introduction to Statistics