1 / 52

Analysis of Categorical Data

Analysis of Categorical Data. Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology. Outline. Types of categorical analysis Steps to analysis. Overview univariable analysis. I ntroduction.

walterm
Download Presentation

Analysis of Categorical Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Categorical Data Dr SitiAzrinBintiAbHamid Unit Biostatistics and Research Methodology

  2. Outline • Types of categorical analysis • Steps to analysis

  3. Overview univariable analysis

  4. Introduction • Categorical data analysis deals with discrete data that can be organized into categories. • The data are organized into a contingency table.

  5. Types of categorical data analysis

  6. Hypothesis testing

  7. Contingency table • Consists of two columns and two rows. • Cells are labeled A through D. • Columns and rows are added for labels. • Row: independent variable / exposure / risk factors • Column: dependent variable / outcome

  8. Example of contingency table

  9. Pearson Chi-square • To test the association between two categorical variables • Independent sample • Result of test: • Not significant: no association • Significant: an association

  10. Research Question • Does estrogen receptor associated with breast cancer status? • Data: Breast cancer.sav

  11. Step 1: State the hypothesis • HO: There is no association between estrogen receptor and breast cancer status. • HA: There is an association between estrogen receptor and breast cancer status.

  12. Step 2: Set the significance level • α = 0.05

  13. Step 3: Check the assumption • Two variables are independent • Two variables are categorical • Expected count of < 5 - > 20%: Fisher exact test - < 20%: Pearson Chi-square Expected count = Row total x Column total Grand total

  14. Step 3: Check the assumption

  15. Step 4: Statistical test • Calculate the Chi-square value x2 = ∑((O – E)2/ E) = 5.897 df= (R-1)(C-1) = (2-1)(2-1) = 1 Between 0.01 – 0.02

  16. Step 4: Statistical test 4 1 5 3 7 2 6 8 10 9

  17. Step 5: Interpretation p value = 0.016 < 0.05 – reject HO, accept HA

  18. Step 6: Conclusion • There is significant association between estrogen receptor and breast cancer status using Pearson Chi-square test (p = 0.016).

  19. Fisher’s Exact Test • To test the association between two categorical variables • Independent sample • Sample sizes are small

  20. Research Question • Does gender associated with coronary heart disease? • Data: CHD data.sav

  21. Step 1: State the hypothesis • HO: There is no association between gender and coronary heart disease. • HA: There is an association between gender and coronary heart disease.

  22. Step 2: Set the significance level • α = 0.05

  23. Step 3: Check the assumption • Two variables are independent • Two variables are categorical • Expected count of < 5 - > 20%: Fisher exact test - < 20%: Pearson Chi-square Expected count = Row total x Column total Grand total

  24. Step 3: Check the assumption 2 cells (50%) – expected count < 5

  25. Step 4: Statistical test • Calculate the Chi-square value x2 = ∑((O – E)2/ E) = 3.0968 df= (R-1)(C-1) = (2-1)(2-1) = 1 Between 0.1 – 0.05

  26. Step 4: Statistical test 4 1 5 3 7 6 2 8 10 9

  27. Step 5: Interpretation p value = 0.140 > 0.05 – accept HO

  28. Step 6: Conclusion • There is no significant association between gender and coronary heart disease using Fisher’s Exact test (p = 0.140).

  29. McNemar Test • Categorical data • Dependent sample - Matched sample - Cross over design - Before & after (same subject) • To determine whether the row and column marginal frequencies are equal (marginal homogeneity)

  30. Hypotheses • Null hypothesis of marginal homogeneity states the two marginal probabilities for each outcome are the same HO : PB = PC HA : PB≠PC A & D = concordant pair B & C = discordant pair Discordant pair is pair of different outcome

  31. Research Question • Does type of mastectomy associated with 5-year survival proportion in patients with breast cancer? • The sample were breast cancer patients - matched for age (same decade of age) - same clinical condition • Data: breast ca.sav

  32. Step 1: State the hypothesis • HO: There is no association between type of mastectomy and 5-year survival proportion in patients with breast cancer. • HA: There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer.

  33. Step 2: Set the significance level • α = 0.05

  34. Step 3: Check the assumption • Two variables are dependent • Two variables are categorical

  35. Step 4: Statistical test • x2 = (|b-c|-1)2/(b + c) = (|0 – 8| - 1)2 / (0 +8) =6.125 • df= (R-1)(C-1) = (2-1)(2-1) = 1 Calculated x2 > tabulated x2 *x2 = (|b-c|-0.5)2/(b + c)

  36. Step 4: Statistical test 3 6 2 1 9 7 4 5 8

  37. Step 5: Interpretation p value = 0.008 < 0.05 – reject HO, accept HA

  38. Step 6: Conclusion • There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer using McNemar test (p = 0.008).

  39. Cochran Mantel-Haenszel Test • Test is a method to compare the probability of an event among independent groups in stratified samples. • The stratification factor can be study center, gender, race, age groups, obesity status or disease severity. • Gives a stratified statistical analysis of the relationship between exposure and disease, after controlling for a confounder (strata variables). • The data are arranged in a series of associated 2 × 2 contingency tables.

  40. Research Question • Does the type of treatment associated with response of treatment among migraine patients after controlling for gender? • Confounder: gender

  41. Step 1: 2x2 contingency table

  42. Step 2: Check the assumption • Random sampling • Stratified sampling

  43. Step 3: State the hypothesis • HO: There is no association between type of treatment and response of treatment among female and male migraine patients. • HA: There is an association between type of treatment and response of treatment among female and male migraine patients.

  44. Step 4: Statistical test • Compute the expected frequency from each stratum ei = (ai + bi)(ai + ci) ni • Compute each stratum vi = (ai +bi)(ci +di)(ai +ci)(bi + di) ni2(ni -1) • Compute Mantel-Haenszel statistics x2MH = ∑(ai –ei)2 ∑vi

  45. Step 4: Statistical test • Compute the expected frequency from each stratum ei = (ai + bi)(ai + ci) ni e1 = (16 +11)(16+ 5) 52 = 10.9038 e2 = (12 +16)(12+ 7) 54 = 9.8519

  46. Step 4: Statistical test • Compute each stratum vi = (ai +bi)(ci +di)(ai +ci)(bi + di) ni2(ni -1) v1 = (16 + 11)(5 + 20)(16 + 5)(11+20) (52)2(52-1) = 3.1865 v2 = (12 + 16)(7 + 19)(12 + 7)(16+19) (54)2(54-1) = 3.1325

  47. Step 4: Statistical test • Compute Mantel-Haenszel statistics x2MH = (∑ai–∑ei)2 ∑vi = ((16 +12) - (10.9038 + 9.8519))2 3.1865 + 3.1325 = 8.3051 = 8.31

  48. Step 4: Statistical test • Compute odd ratio ORMH = ∑(aidi/ ni) ∑(bici/ ni) = (16 x 20/ 52) + (12 x 19 / 54) (11 x 5/ 52) + (16 x 7/ 54 = 3.313

  49. Step 4: Statistical test Data: Migraine.sav 1 3 2 4 6 5

  50. Step 5: Interpretation • Compute Mantel-Haenszel statistics x2MH = (∑ai–∑ei)2 ∑vi = ((16 +12) - (10.9038 + 9.8519))2 3.1865 + 3.1325 = 8.3051 = 8.31 Calculated value > tabulated value Reject HO

More Related