Handling Categorical Data

Handling Categorical Data

Learning Outcomes • At the end of this session and with additional reading you will be able to: • Understand when and how to analyse frequency counts

Analysing categorical variables • Frequencies • The number of observations within a given category

Assumptions of Chi squared • Each observation only contributes to only one cell of the contingency table • The expected frequencies should be greater than 5

Chi Squared II • Pearsons Chi squared • Assess the difference between observed frequencies and expected frequencies in each cell • This is achieved by calculating the expected values for each cell • Model = RT x CT N

Chi Squared III • Likelihood ratio • a comparison of observed frequencies by those predicted by the model (expected) • Yates correction • with a 2 x 2 contingency table Pearson’s chi squared can produce a type 1 error (subtract .5 from the deviation and square it) • this makes it less significant

The contingency table I • Using my case study on stop and search suppose we wanted to ascertain if black males were stopped more in one month than white males • One variable • (black or white male) • What does this tell us

One-way Chi Squared • In a simple one way chi squared we would expect that if we had 148 people they would be evenly split between white and black males so • expected values would be 78

One-way Chi Squared

SPSS output

The contingency table II • It would more useful to look at an additional variable lets say age • Two variables • Males • Black/white • Age • Under 18/over 18

The contingency table II

Example • Now using the formula calculate the expected values for the consistency table • Model = RT x CT N

SPSS output

Effect size

Odds ratio • The odds that a given observation is likely to happen

Loglinear analysis • Loglinear works on backward elimination of a model • Saturated first, then removes predictors • just like an ANOVA a loglinear assesses the relationship between all variables and describes the outcomes in terms of interactions

Loglinear analysis II • With our previous example we had two variables • ethnicity and age • If we now added reason for stop and search a loglinear analysis will first assess the 3-way interaction and then assess the varying two-way interactions

Assumptions of loglinear analysis • Similar to those of chi squared • observations should fall into one category alone • no more than 20% of cells with frequencies less than 5 • all cells must have frequencies greater than 1 • if you don’t meet this assumption you need to decide whether to proceed with the analysis or collapse the data across variables

Output I • No of cases should equal the no of total observations • No of factors (variables) • No of levels (sub-divisions within each variable) • Saturated model the maximum interaction possible with observed frequencies • Goodness of fit and likelihood ration statistics • the expected frequencies are significantly different from the observed • these should be non significant if model is a good fit

Output II • Goodness fit preferred for large samples • Likelihood ration is preferred for small samples • K-way higher order is asking • if you remove the highest order interaction will the fit of the model be affected • the next k-way affect asking if you remove the highest order following by the next order will the fit of the model be affected • and so on until all affects are removed

Output III • K-way effects are zero asks the opposite • that is whether removing main effects will have an effect on the model • the final step is the backward elimination • the analysis will keep going until it has eliminated all effects and advises that • the best model has generated class

Now lets try one

Handling Categorical Data

Handling Categorical Data

Presentation Transcript

Categorical Data Analysis

Categorical Data

Categorical Data Analysis

Categorical Data

Categorical Data

Analyzing categorical data

Displaying Categorical data

Categorical Data

Explore Categorical Data

Categorical Data

Interpreting Categorical Data

Categorical Data

Categorical Data

Categorical Data

Categorical Data Analysis

Categorical Data

Classifying Categorical Data

Analyzing Categorical Data

Categorical Data Analysis

Classifying Categorical Data

Categorical Data Analysis

Categorical data