250 likes | 499 Views
Cross-Tabulation Tables. Tables in R and Computing Chi Square. Kinds of Data. Nominal or Ordinal (few categories) Interval if it is grouped Some tests ignore the ordering of the categories (e.g. Chi square ) In R this means we are working with factors. Kinds of Tables.
E N D
Cross-Tabulation Tables Tables in R and Computing Chi Square
Kinds of Data • Nominal or Ordinal (few categories) • Interval if it is grouped • Some tests ignore the ordering of the categories (e.g. Chi square) • In R this means we are working with factors
Kinds of Tables • One line per observation, e.g. data on Ernest Witte where each row is a single individual - table() and Rcmdr() • One line per cell with a column of numbers representing the count for that cell – xtabs()
Kinds of Tables • A row for each category of the first variable and a column for each category of the second variable with counts at the intersection of a row and column – Rcmdr (Enter table directly)
Type 1 > EWG2[sample(rownames(EWG2), 6),c("Age", "Goods")] Age Goods 159 Middle Adult Absent 126 Child Present 075 Child Absent 156 Old Adult Present 095 Adult Absent 157 Old Adult Absent
Type 2 Age Goods Freq Child Absent 18 Adult Absent 51 Child Present 19 Adult Present 55
Type 3 Absent Present Child 18 19 Adult 51 55
Factors in R • Factors use integers to code for categorical data • Each integer code is associated with a label, e.g. 1 could stand for “Absent” and 2 for “Present” • Usually R creates factors from any character data columns
Factors • Regular factors are either equal or not equal (nominal) • Ordered factors can be >, ==, and < • Rcmdr makes is easy to convert a numeric variable to a factor, to change the factor labels, to change the order of the factor levels, and to make the factor ordered
Tables in R • Tables are basically matrices with labeling • Transferring between data.frames and tables is possible but can lead to unexpected results • Rcmdr does not recognize tables.
Key table commands in R • table() – create one and multi-way tables • xtabs() - uses formulas (and optionally weights/counts) • addmargins() – add row and column totals • prop.table() – create table of proportions
Key commands (cont.) • ftable() – flatten a multidimensional table – but does not work with xtable() • print(xtable(), type=“html”) – print an html version of the table.
# Use Rcmdr to load ErnestWitte and create EWG2 # EWG2 <- subset(ErnestWitte, subset=Group==2) table(EWG2$Age) EWG2$Age <- factor(EWG2$Age) Table1 <- table(EWG2$Age, EWG2$Goods, dnn=c("Age", "Goods")) Table1 str(Table1) Table2 <- xtabs(~Age+Goods, data=EWG2) Table2 str(Table2) DF1 <- data.frame(Table1) DF1 names(DF1) <- c("Age", "Goods","Freq") DF
Table3 <- xtabs(Freq~Age+Goods, data=DF1) Table3 addmargins(Table1) prop.table(Table1) prop.table(Table1, 1) prop.table(addmargins(Table1, 1), 1) # Included in Rcmdr rowPercents(Table1) colPercents(Table1)
Table4 <- xtabs(~Adult+Goods+Pathology, data=EWG2) Table4 str(Table4) ftable(Table4, row.vars=c(1, 2), col.vars=3) ftable(Table4, row.vars=c(3, 2), col.vars=1) # tohtml() puts html code for table into Windows # clipboard or a file # named “clipboard” in Mac OsX or Linux tohtml <- function(x) print(xtable(x), type="html", file="clipboard") tohtml(Table1) # Paste clipboard into Microsoft Excel
Null Hypothesis • The usual null hypothesis is that the row and column variables are independent of one another – knowing one does not help us predict the other • If the null hypothesis is false, the cell values will deviate from expected values
E.g. Coin Flipping • If I flip a coin twice, the chance that the first flip comes up heads is .5 • The chance that the second flip comes up heads is .5 as well • But what if the chance of getting a head changed depending on the first toss? The probabilities would be conditional
Expected Probabilities • Under the null hypothesis the expected value for a cell is • (Row sum * Column sum)/Total count • Deviations of the actual counts from the expected values is measured as • (Observed – Expected)2/Expected • Summing the deviations over all cells gives us a statistic with a chi-square distribution
Chi-Square Test • Compares observed counts to expected counts based on independence • Rcmdr constructs the tables and computes the test, BUT deletes the results
Two Options • chisq.test() • Saves results in multiple tables • Performs Chi Square and simulation for p value • CrossTable() and crosstab() in descr • SAS, SPSS style output with xtable() • More formatting options • Mosaic plot with crosstab()
Results <- chisq.test(xtabs(~Age+Pathology, data=EWG2), simulate.p.value=TRUE) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: xtabs(~Age + Pathology, data = EWG2) X-squared = 31.2876, df = NA, p-value = 0.0004998 str(Results) Results$expected Results$residuals fisher.test(xtabs(~Sex+Goods, data=EWG2))
with(EWG2, CrossTable(Age, Pathology)) with(EWG2, CrossTable(Age, Pathology, prop.c=FALSE, prop.t=FALSE)) with(EWG2, crosstab(Age, Pathology)) with(EWG2, crosstab(Age, Pathology, expected=TRUE, resid=TRUE)) with(EWG2, crosstab(Sex, Goods))