430 likes | 582 Views
Introduction to Categorical Descriptive Statistics. Overview. Contingency tables Notation Descriptive statistics Difference in proportions Relative risk Odds ratio SPSS. Contingency Tables. Two dimensional tables Let X and Y be categorical variables X has I levels and Y has J levels
E N D
Overview • Contingency tables • Notation • Descriptive statistics • Difference in proportions • Relative risk • Odds ratio • SPSS
Contingency Tables • Two dimensional tables • Let X and Y be categorical variables • X has I levels and Y has J levels • A contingency or cross-classification table is a tabular representation of the frequency counts for each pair of variable levels
Notation • Cell Counts
Cell Proportions • Often not as interested in absolute counts within cells as opposed to the relationship between the cell proportions • To properly analyze cell proportions need to know experimental design and relationship between the variables • All variables can be considered response variables • One (or more) response variable and one (or more explanatory variable • Prospective study • Retrospective study
Two Variables • Proportion notation • {ij} gives the joint distribution • {i+} and {+j} represent the marginals • {j|i} is the conditional distribution of Y given level i of X
Example • 1982 General Social Survey report on attitudes about death penalty and gun registration • Calculate joint, marginal and conditional distributions
Prospective Study • Subjects either select or are selected for treatment groups and then response is studied • Experimental • Subjects are randomly allocated to treatment groups • Observational • Subjects self-select treatment group • Principal aim is to compare conditional distribution of response for different levels of explanatory variable(s)
Example • Findings from the Aspirin Component of the Ongoing Physicians’ Health Study • Calculate conditional distribution
Retrospective Study • Given response, look back at levels of possible explanatory variables • Observational studies • Typically “over-sample” for response level of interest • If know overall population proportion in each response level could use Bayes theorem to calculate conditional distribution in direction of interest
Example • England-Wales 1968-1972 study on heart attacks and oral contraceptive use • Calculate appropriate conditional distribution
Descriptive Statistics • Comparing proportions for binary responses • Difference of proportions • Relative risk • Odds ratios • Independence • X and Y response: pij = pi+p+j, for all i,j • That is, pj|i = p+j, for all i,j • X explanatory, Y response: pj|i = pj|h, for each j, for all i,h
Descriptive Statistics • I x J tables • No completely satisfactory way to summarize association • Pairs of odds ratios • Concentration coefficient • Uncertainty coefficient
Difference of Proportions • Binary response variable • Generally, compare response for different explanatory levels • p1|i - p1|h • Difference lies between -1 and 1 • Independence when difference equals 0 for all i,h and response levels j • Reasonable measure when absolute difference in proportions is relevant • Also can compare differences between columns
Difference of Proportions • Example
Difference of Proportions • Example
Difference of Proportions • Example
Relative Risk • Used when relative difference between proportions more relevant than absolute difference • p1|1 /p1|2 • Relative risk of 1 corresponds to independence • Comparison on second response different • Usually can not be directly calculated from retrospective studies
Example Risk for women having first child at 25 or older = .019 or 1.9% Risk for women having first child before 25 = .0143 or 1.43% Relative risk = .019/.0143 = 1.33 Increased risk = 33%
Relative Risk • Example
Odds Ratio • For 2x2 table, • In row 1, odds of being column 1 instead of column 2: O1 = p1|1 /p2|1 • In row 2, odds of being column 1 instead of column 2: O2 = p1|2 /p2|2 • Odds ratio: O1/O2
Odds Ratio • Takes values > 0 • Sometimes look at log odds ratio • Invariant to interchanging rows and columns • Unnecessary to specify response variable • Unlike difference of proportions, and relative risk
Odds Ratio • Multiplicative invariance within given row or column • Like difference of proportions and relative risk • Equally valid for retrospective, prospective and cross-sectional studies
Example Odds for women having first child at 25 or older = 31/1597 = .019/.981 = .0194 Odds for women having first child before 25 = 65/4475 = .0143/.9857 = .0145 Odds ratio = .0194/.0145 = 1.34
Relationship Between Odds Ratio and Relative Risk • Odds ratio = Relative risk (1-p1|2)/(1-p1|1) • When probability of outcome of interest is small, regardless of row condition, then can use odds ratio as an estimate of relative risk
Interpreting Risks and Odds • Assess baseline risk • Example: Men who drink 16 ounces of beer a day are three times more likely to develop rectal cancer • Know time period of risk • Risks accumulate with time • Example: 1 in 9 women will develop breast cancer over their lifetime. But annual risk of women in their 30’s is 1 in 3700 and women in their 70’s is 1 in 235 • Investigate confounding factors • Example: Older cars are almost 6 times as likely to be stolen than newer cars
Simpson’s Paradox • Survival rates for a standard and a new treatment at two hospitals
Relative Risk • Hospital A: • Risk of dying with standard treatment = 95/100 = .95 • Risk of dying with new treatment = 900/1000 = .90 • Relative risk = .95/.90 = 1.06
Relative Risk • Hospital B: • Risk of dying with standard treatment = 500/1000 = .5 • Risk of dying with new treatment = 5/100 = .05 • Relative risk = .5/.05 = 10.0
Combined Data • Group data from both hospitals • Risk of dying with standard treatment = 595/1100 = .54 • Risk of dying with new treatment = 905/1100 = .82 • Relative risk = .54/.82 = .66
What is Going On? • When data is combined, lose the information that the patients in Hospital A had BOTH a higher overall death rate AND a higher likelihood of receiving the new treatment • Misleading to summarize information over groups, especially if subjects were not randomly assigned to groups
Confounding Variables • Television ownership versus movie attendance
Confounding Variables • Control for income
More Examples • Discrimination in college admission • Racial bias in death penalty sentences
College Admission Bias • Over a given number of years the University of California, Berkeley admitted 44% of all men who applied to any one of six graduate programs and only 30% of women who applied • Is there evidence of discrimination in graduate admissions at Berkeley?
Death Penalty Sentences • Results of 1981 Florida study of whether race of homicide defendant affect likelihood that death penalty would receive death penalty
Question • Based on data, is race a factor in whether the death penalty is received and if so how is race a factor?