210 likes | 297 Views
Chapter 5. Two-Way Tables Associations Between Categorical Variables. Associations between variables. Quantitative variables correlation [Ch 3] & regression [Ch 4] categorical variables two-way tables of frequency counts [Ch 5]. Variables.
E N D
Chapter 5 Two-Way Tables Associations Between Categorical Variables
Associations between variables • Quantitativevariables correlation [Ch 3] & regression [Ch 4] • categorical variables two-way tables of frequency counts [Ch 5]
Variables Two-Way Table of CountsR-by-C tables EDUCATION variable = row variable (4 levels) AGE variable = column variable (3 levels) This is a 4-by-3 table
Variables Column variablemarginal distribution Marginal Distributions 27,85858,07744,46544,828 37,786 81,435 56,008 Row variablemarginal totals
Relative frequencies (%s) for each variable separately Descriptive purposes only; does not address association Illustrative Example (Distribution of education level) Statement: Describe the distribution of education levels in the population Plan: Calculate marginal percents for row variable “EDUCATION” Marginal Percents
Marginal Percents Example Step 3: “Solve” Row totals Table total
Marginal Percents (Example) Step 4: “Conclude” • 16% did not complete high school • 33% completed high school • 25% completed 1 to 3 years of college • 26% completed 4+ years of college Merely descriptive statements 7
Association If the row variable is the explanatory variable→ compare conditionalrow proportions If the column variable is the explanatory variable→ compare conditionalcolumn proportions Use conditional proportions to determine associations 8
Example: Association between AGE & EDUCATION State: Is AGE associated with EDUCATION level? Plan: Since AGE is the explanatory variable calculate conditional column proportions. We do not need to calculate every conditional proportion. (Be selective.) Let us calculate the proportion completing 4+ years of college by AGE
Example: “Solve” & “Conclude” Conclude: As age goes up, % completing college goes down Negative association between age and college completion 10
Direction of association • No association: conditional percents nearly equal at all levels of the explanatory variable • Positive association: as explanatory variable rises conditional percentages increase • Negative associations: as explanatory variable rises conditional percentages go down
State:Is ACCEPTANCE into UC Berkeley graduate school (response variable) associated with GENDER (explanatory variable)? Example: Gender bias? Plan: Since GENDER is the explanatory variable calculate row percents (acceptance “rates” by gender); compare % accepted by GENDER 12
Example: “Gender bias?” Step 3: Solve Conclude: positive association between “maleness” and acceptance 13
Simpson’s Paradox Simpson’s Paradox ≡ lurking variable reverses direction of the association • Lurking variable MAJOR applied to • Business school major (240 applicants) • Art school major (320 applicants) • State: Does lurking variable explain association between maleness and acceptance? • Plan: Subdivide (“stratify”) data into subgroups according to lurking variable MAJOR then calculate acceptance rates by gender within subgroups
Business School Applicants Conclude: Negative association with maleness Conclude: Negative association with maleness 16
Art School Applicants Conclude: Negative association with maleness 17
Gender Bias Example Conclusion • Overall: higher acceptance rate for men • Within Business school: higher acceptance rate for women • Within Art school: higher acceptance rate for women • Therefore, the lurking variable (MAJOR) reversed the direction of the association (Simpson’s Paradox) • Acceptance to grad school at UC Berkeley favored women after “controlling for” MAJOR
HIV vaccine boost(Exercise 5.6) State: Do data support that vaccine delivered by EP results in a higher proportion responding? Plan = ? Solution = ? Conclusion = ? 19
Kidney Stones(Exercise 5.7) (a) Find % of kidney stones, combining the data for small and large stones, that were successfully removed for each of the two procedures. Which procedure had the higher overall success rate? (b) What % of all small kidney stones were successfully removed? What % of all large kidney stones…? Which type of kidney stone is easier to treat?
Helicopter EvacuationLurking Variable /Simpson’s Paradox X Helicopter or Road Y Survived or Died Z Accident Severity 22