1 / 24

Cross-Tabulation Tables

Cross-Tabulation Tables. Tables in R and Computing Chi Square. Kinds of Data. Nominal or Ordinal (few categories) Interval if it is grouped Some tests ignore the ordering of the categories (e.g. Chi square ) In R this means we are working with factors. Kinds of Tables.

seoras
Download Presentation

Cross-Tabulation Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-Tabulation Tables Tables in R and Computing Chi Square

  2. Kinds of Data • Nominal or Ordinal (few categories) • Interval if it is grouped • Some tests ignore the ordering of the categories (e.g. Chi square) • In R this means we are working with factors

  3. Kinds of Tables • One line per observation, e.g. data on Ernest Witte where each row is a single individual - table() and Rcmdr() • One line per cell with a column of numbers representing the count for that cell – xtabs()

  4. Kinds of Tables • A row for each category of the first variable and a column for each category of the second variable with counts at the intersection of a row and column – Rcmdr (Enter table directly)

  5. Type 1 > EWG2[sample(rownames(EWG2), 6),c("Age", "Goods")] Age Goods 159 Middle Adult Absent 126 Child Present 075 Child Absent 156 Old Adult Present 095 Adult Absent 157 Old Adult Absent

  6. Type 2 Age Goods Freq Child Absent 18 Adult Absent 51 Child Present 19 Adult Present 55

  7. Type 3 Absent Present Child 18 19 Adult 51 55

  8. Factors in R • Factors use integers to code for categorical data • Each integer code is associated with a label, e.g. 1 could stand for “Absent” and 2 for “Present” • Usually R creates factors from any character data columns

  9. Factors • Regular factors are either equal or not equal (nominal) • Ordered factors can be >, ==, and < • Rcmdr makes is easy to convert a numeric variable to a factor, to change the factor labels, to change the order of the factor levels, and to make the factor ordered

  10. Tables in R • Tables are basically matrices with labeling • Transferring between data.frames and tables is possible but can lead to unexpected results • Rcmdr does not recognize tables.

  11. Key table commands in R • table() – create one and multi-way tables • xtabs() - uses formulas (and optionally weights/counts) • addmargins() – add row and column totals • prop.table() – create table of proportions

  12. Key commands (cont.) • ftable() – flatten a multidimensional table – but does not work with xtable() • print(xtable(), type=“html”) – print an html version of the table.

  13. # Use Rcmdr to load ErnestWitte and create EWG2 # EWG2 <- subset(ErnestWitte, subset=Group==2) table(EWG2$Age) EWG2$Age <- factor(EWG2$Age) Table1 <- table(EWG2$Age, EWG2$Goods, dnn=c("Age", "Goods")) Table1 str(Table1) Table2 <- xtabs(~Age+Goods, data=EWG2) Table2 str(Table2) DF1 <- data.frame(Table1) DF1 names(DF1) <- c("Age", "Goods","Freq") DF

  14. Table3 <- xtabs(Freq~Age+Goods, data=DF1) Table3 addmargins(Table1) prop.table(Table1) prop.table(Table1, 1) prop.table(addmargins(Table1, 1), 1) # Included in Rcmdr rowPercents(Table1) colPercents(Table1)

  15. Table4 <- xtabs(~Adult+Goods+Pathology, data=EWG2) Table4 str(Table4) ftable(Table4, row.vars=c(1, 2), col.vars=3) ftable(Table4, row.vars=c(3, 2), col.vars=1) # tohtml() puts html code for table into Windows # clipboard or a file # named “clipboard” in Mac OsX or Linux tohtml <- function(x) print(xtable(x), type="html", file="clipboard") tohtml(Table1) # Paste clipboard into Microsoft Excel

  16. Null Hypothesis • The usual null hypothesis is that the row and column variables are independent of one another – knowing one does not help us predict the other • If the null hypothesis is false, the cell values will deviate from expected values

  17. E.g. Coin Flipping • If I flip a coin twice, the chance that the first flip comes up heads is .5 • The chance that the second flip comes up heads is .5 as well • But what if the chance of getting a head changed depending on the first toss? The probabilities would be conditional

  18. Expected Probabilities • Under the null hypothesis the expected value for a cell is • (Row sum * Column sum)/Total count • Deviations of the actual counts from the expected values is measured as • (Observed – Expected)2/Expected • Summing the deviations over all cells gives us a statistic with a chi-square distribution

  19. Chi-Square Test • Compares observed counts to expected counts based on independence • Rcmdr constructs the tables and computes the test, BUT deletes the results

  20. Two Options • chisq.test() • Saves results in multiple tables • Performs Chi Square and simulation for p value • CrossTable() and crosstab() in descr • SAS, SPSS style output with xtable() • More formatting options • Mosaic plot with crosstab()

  21. Results <- chisq.test(xtabs(~Age+Pathology, data=EWG2), simulate.p.value=TRUE) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: xtabs(~Age + Pathology, data = EWG2) X-squared = 31.2876, df = NA, p-value = 0.0004998 str(Results) Results$expected Results$residuals fisher.test(xtabs(~Sex+Goods, data=EWG2))

  22. with(EWG2, CrossTable(Age, Pathology)) with(EWG2, CrossTable(Age, Pathology, prop.c=FALSE, prop.t=FALSE)) with(EWG2, crosstab(Age, Pathology)) with(EWG2, crosstab(Age, Pathology, expected=TRUE, resid=TRUE)) with(EWG2, crosstab(Sex, Goods))

More Related