1 / 24

Handling Categorical Data

Handling Categorical Data. Learning Outcomes. At the end of this session and with additional reading you will be able to: Understand when and how to analyse frequency counts. Analysing categorical variables. Frequencies The number of observations within a given category.

prince
Download Presentation

Handling Categorical Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handling Categorical Data

  2. Learning Outcomes • At the end of this session and with additional reading you will be able to: • Understand when and how to analyse frequency counts

  3. Analysing categorical variables • Frequencies • The number of observations within a given category

  4. Assumptions of Chi squared • Each observation only contributes to only one cell of the contingency table • The expected frequencies should be greater than 5

  5. Chi Squared II • Pearsons Chi squared • Assess the difference between observed frequencies and expected frequencies in each cell • This is achieved by calculating the expected values for each cell • Model = RT x CT N

  6. Chi Squared III • Likelihood ratio • a comparison of observed frequencies by those predicted by the model (expected) • Yates correction • with a 2 x 2 contingency table Pearson’s chi squared can produce a type 1 error (subtract .5 from the deviation and square it) • this makes it less significant

  7. The contingency table I • Using my case study on stop and search suppose we wanted to ascertain if black males were stopped more in one month than white males • One variable • (black or white male) • What does this tell us

  8. One-way Chi Squared • In a simple one way chi squared we would expect that if we had 148 people they would be evenly split between white and black males so • expected values would be 78

  9. One-way Chi Squared

  10. SPSS output

  11. The contingency table II • It would more useful to look at an additional variable lets say age • Two variables • Males • Black/white • Age • Under 18/over 18

  12. The contingency table II

  13. Example • Now using the formula calculate the expected values for the consistency table • Model = RT x CT N

  14. SPSS output

  15. SPSS output

  16. Effect size

  17. Odds ratio • The odds that a given observation is likely to happen

  18. Loglinear analysis • Loglinear works on backward elimination of a model • Saturated first, then removes predictors • just like an ANOVA a loglinear assesses the relationship between all variables and describes the outcomes in terms of interactions

  19. Loglinear analysis II • With our previous example we had two variables • ethnicity and age • If we now added reason for stop and search a loglinear analysis will first assess the 3-way interaction and then assess the varying two-way interactions

  20. Assumptions of loglinear analysis • Similar to those of chi squared • observations should fall into one category alone • no more than 20% of cells with frequencies less than 5 • all cells must have frequencies greater than 1 • if you don’t meet this assumption you need to decide whether to proceed with the analysis or collapse the data across variables

  21. Output I • No of cases should equal the no of total observations • No of factors (variables) • No of levels (sub-divisions within each variable) • Saturated model the maximum interaction possible with observed frequencies • Goodness of fit and likelihood ration statistics • the expected frequencies are significantly different from the observed • these should be non significant if model is a good fit

  22. Output II • Goodness fit preferred for large samples • Likelihood ration is preferred for small samples • K-way higher order is asking • if you remove the highest order interaction will the fit of the model be affected • the next k-way affect asking if you remove the highest order following by the next order will the fit of the model be affected • and so on until all affects are removed

  23. Output III • K-way effects are zero asks the opposite • that is whether removing main effects will have an effect on the model • the final step is the backward elimination • the analysis will keep going until it has eliminated all effects and advises that • the best model has generated class

  24. Now lets try one

More Related