Overview of some tests

Overview of some tests Thomas INGICCO J.L.T. Géricault, Le Radeau de La Méduse J.L.T. Géricault, The Raft of The Medusa

Chi square test Aim: Comparison of observed effectives Oijto theoretical effectives Eij Are the lines and columns of a crossed table independant?Meaningthatbeing part of the first variable has no influence on the modality of being part of the second variable.

Chi square test Aim: Comparison of observed effectives Oijto theoretical effectives Eij Are the lines and columns of a crossed table independant. Meaningthatbeing part of the first variable has no influence on the modality of being part of the second variable. Measured variable: A qualitative variable withk classes

Chi square test Aim: Comparison of observed effectives Oijto theoretical effectives Eij Are the lines and columns of a crossed table independant. Meaningthatbeing part of the first variable has no influence on the modality of being part of the second variable. Measured variable: A qualitative variable withk classes Conditions of utilization: The class of the variables must be exclusives Cochran’srule must berespected: in each class Eij ≥ 5. But possiblity to have some classes with 1 ≤ Oij≤ 5 if a minimum of 80% of the totality of the classes have Oij > 5

Chi square test Aim: Comparison of observed effectives Oijto theoretical effectives Eij Are the lines and columns of a crossed table independant. Meaningthatbeing part of the first variable has no influence on the modality of being part of the second variable. Measured variable: A qualitative variable withk classes Conditions of utilization: The class of the variables must be exclusives Cochran’srule must berespected: in each class Eij ≥ 5. But possiblity to have some classes with 1 ≤ Oij≤ 5 if a minimum of 80% of the totality of the classes have Oij > 5 Test hypotheses: H0: πi = Pthéo i The theoretical proportions Pthéo i are the real proportions in the observed population H1 bilat: One at least of the theoretical proportions is not the real proportion in the observed population

Chi square test Aim: Comparison of observed effectives Oijto theoretical effectives Eij Are the lines and columns of a crossed table independant. Meaningthatbeing part of the first variable has no influence on the modality of being part of the second variable. Measured variable: A qualitative variable withk classes Conditions of utilization: The class of the variables must be exclusives Cochran’srule must berespected: in each class Eij ≥ 5. But possiblity to have some classes with 1 ≤ Oij≤ 5 if a minimum of 80% of the totality of the classes have Oij > 5 Test hypotheses: H0: πi = Pthéo i The theoretical proportions Pthéo i are the real proportions in the observed population H1 bilat: One at least of the theoretical proportions is not the real proportion in the observed population The statisticis: In R: sum((Oij - Eij)^2/Eij)

Chi square test In details: Ceram<-read.table("K:/Cours/Philippines/Statistics-210/Data/Ceramics.txt" ,header=TRUE)

Chi square test In details: Ceram<-read.table("K:/Cours/Philippines/Statistics-210/Data/Ceramics.txt" ,header=TRUE) obs1<-data.frame(Ceram[,10:11]) obs2<-na.omit(obs1) for(i in 1:length(obs2)){obs2[,i]<-factor(obs2[,i])} obs3<- table(obs2) addmargins(obs3)

Chi square test In details: Ceram<-read.table(i"K:/Cours/Philippines/Statistics-210/Data/Ceramics.txt" ,header=TRUE) obs1<-data.frame(Ceram[,10:11]) obs2<-na.omit(obs1) for(i in 1:length(obs2)){obs2[,i]<-factor(obs2[,i])} obs3<- table(obs2) addmargins(obs3) graphics.off() par(cex.lab=1.5, xpd=NA, font=2) mosaicplot(t(obs3), main=NULL, cex.axis=1.1)

Chi square test In details: Ceram<-read.table("K:/Cours/Philippines/Statistics-210/Data/Ceramics.txt" ,header=TRUE) obs1<-data.frame(Ceram[,10:11]) obs2<-na.omit(obs1) for(i in 1:length(obs2)){obs2[,i]<-factor(obs2[,i])} obs3<- table(obs2) addmargins(obs3) graphics.off() par(cex.lab=1.5, xpd=NA, font=2) mosaicplot(t(obs3), main=NULL, cex.axis=1.1) obs3theo<-suppressWarnings(chisq.test(obs3)$expected) addmargins(obs3theo) nij<-obs3 tij<-obs3theo chi2.calc<-sum((nij-tij)^2/tij) chi2.calc k<-dim(obs3)[1] c<-dim(obs3)[2] nu=(k-1)*(c-1) nu pchisq(chi2.calc, nu, lower.tail=FALSE)

Chi square test In details: # Test in R chisq.test(obs3)

Fisher test Aim: Comparison of observed effectives G & F (independance of 2 qualitatives variables)as observed proportions PG1/F1 and PG1/F2 (equality of the proportions)

Fisher test Aim: Comparison of observed effectives G & F (independance of 2 qualitatives variables)as observed proportions PG1/F1 and PG1/F2 (equality of the proportions) Measured variable: Two qualitative variables F & Gwith 2 classes

Fisher test Aim: Comparison of observed effectives G & F (independance of 2 qualitatives variables)as observed proportions PG1/F1 and PG1/F2 (equality of the proportions) Measured variable: Two qualitative variables F & Gwith 2 classes Conditions of utilization: The class of the variables must be exclusives Qualitative variables are nominal

Fisher test Aim: Comparison of observed effectives G & F (independance of 2 qualitatives variables)as observed proportions PG1/F1 and PG1/F2 (equality of the proportions) Measured variable: Two qualitative variables F & Gwith 2 classes Conditions of utilization: The class of the variables must be exclusives Qualitative variables are nominal Test hypotheses: H0: πG1/F1 = πG1/F12 Proportions are identical i n the target population H1 bilat: πG1/F1 ≠ πG1/F12 Proportions are different in the target population H1 unilat right: πG1/F1 > πG1/F12 Proportion πG1/F1 isstrictlysuperior to the targetpopulation H1 unilatleft: πG1/F1 <πG1/F12 Proportion πG1/F1 isstrictlyinferiorto the target population

Fisher test Aim: Comparison of observed effectives G & F (independance of 2 qualitatives variables)as observed proportions PG1/F1 and PG1/F2 (equality of the proportions) Measured variable: Two qualitative variables F & Gwith 2 classes Conditions of utilization: The class of the variables must be exclusives Qualitative variables are nominal Test hypotheses: H0: πG1/F1 = πG1/F12 Proportions are identical i n the target population H1 bilat: πG1/F1 ≠ πG1/F12 Proportions are different in the target population H1 unilat right: πG1/F1 > πG1/F12 Proportion πG1/F1 isstrictlysuperior to the targetpopulation H1 unilatleft: πG1/F1 <πG1/F12 Proportion πG1/F1 isstrictlyinferiorto the target population The statisticis: In R: sum((Oij - Eij)^2/Eij)

Fisher test In details:

Fisher test In details: Ceram<-read.table("K:/Cours/Philippines/Statistics-210/Lecture-4/Ceramics.txt",header=TRUE)

Fisher test In details: Ceram<-read.table("K:/Cours/Philippines/Statistics-210/Lecture-4/Ceramics.txt",header=TRUE) obs1<-Ceram[,c(12,9)] obs2<-na.omit(obs1) for(i in 1:length(obs2)){obs2[,i]<-factor(obs2[,i])} obs3<- table(obs2) # obs3<-t(obs3) # obs4<-obs3 ; obs4[,1]<-obs3[,2] ; obs4[,2]<-obs3[,1] ; dimnames(obs4)[[2]][1] <- dimnames(obs3)[[2]][2] ; dimnames(obs4)[[2]][2] <- dimnames(obs3)[[2]][1] ; obs3<-obs4 addmargins(obs3)

Fisher test In details: graphics.off() pG1.Fi<- obs3[, 1]/ margin.table(obs3, 1) par(mar=c(5.1, 5.1, 4.1, 2.1)) barplot(pG1.Fi, xlab = paste(labels(dimnames(obs3))[2], " / ", labels(dimnames(obs3))[1]), xaxt = "n", ylab="Proportion", ylim=range(0, max(pG1.Fi)+ 0.15), cex.lab=2, cex.axis=1.8) position.labels <- barplot(pG1.Fi, plot = FALSE)[] axis(side=1, at = position.labels, labels = c(paste(colnames(obs3)[1], " / ", rownames(obs3)[1]), paste(colnames(obs3)[1], " / ", rownames(obs3)[2])), cex.axis=1.8)

Fisher test In details: graphics.off() pG1.Fi<- obs3[, 1]/ margin.table(obs3, 1) par(mar=c(5.1, 5.1, 4.1, 2.1)) barplot(pG1.Fi, xlab = paste(labels(dimnames(obs3))[2], " / ", labels(dimnames(obs3))[1]), xaxt = "n", ylab="Proportion", ylim=range(0, max(pG1.Fi)+ 0.15), cex.lab=2, cex.axis=1.8) position.labels <- barplot(pG1.Fi, plot = FALSE)[] axis(side=1, at = position.labels, labels = c(paste(colnames(obs3)[1], " / ", rownames(obs3)[1]), paste(colnames(obs3)[1], " / ", rownames(obs3)[2])), cex.axis=1.8) windows() par(cex.lab=2, xpd=NA, font=2) mosaicplot(t(obs3), main=NULL, cex.axis=1.5)

Fisher test In details: n11<-obs3[1, 1] n1.<-margin.table(obs3, 1)[1] n21<-obs3[2, 1] n2.<-margin.table(obs3, 1)[2] pG1.F1<-n11/n1. pG1.F2<-n21/n2. t(data.frame(pG1.F1, pG1.F2))

Fisher test In details: n11<-obs3[1, 1] n1.<-margin.table(obs3, 1)[1] n21<-obs3[2, 1] n2.<-margin.table(obs3, 1)[2] pG1.F1<-n11/n1. pG1.F2<-n21/n2. t(data.frame(pG1.F1, pG1.F2)) n12<-obs3[1,2] n22<-obs3[2,2] n.1<-margin.table(obs3,2)[1] n.2<-margin.table(obs3,2)[2] pG2.F1<-n12/n1. pG2.F2<-n22/n2. pF1.G1<-n11/n.1 pF1.G2<-n12/n.2 pF2.G1<-n21/n.1 pF2.G2<-n22/n.2 t(data.frame(pG2.F1, pG2.F2, pF1.G1, pF1.G2, pF2.G1, pF2.G2))

Fisher test In details: n11<-obs3[1, 1] n1.<-margin.table(obs3, 1)[1] n21<-obs3[2, 1] n2.<-margin.table(obs3, 1)[2] pG1.F1<-n11/n1. pG1.F2<-n21/n2. t(data.frame(pG1.F1, pG1.F2)) n12<-obs3[1,2] n22<-obs3[2,2] n.1<-margin.table(obs3,2)[1] n.2<-margin.table(obs3,2)[2] pG2.F1<-n12/n1. pG2.F2<-n22/n2. pF1.G1<-n11/n.1 pF1.G2<-n12/n.2 pF2.G1<-n21/n.1 pF2.G2<-n22/n.2 t(data.frame(pG2.F1, pG2.F2, pF1.G1, pF1.G2, pF2.G1, pF2.G2)) n11<-obs3[1,1] NFE.calc<-n11 NFE.calc

Fisher test In details: n1.<-margin.table(obs3,1)[1] n<- margin.table(obs3) n.1<-margin.table(obs3,2)[1] p.right<-phyper(NFE.calc-1,n1.,n-n1.,n.1,lower.tail=FALSE) p.left<-phyper(NFE.calc,n1.,n-n1.,n.1) p.right p.left

Fisher test In details: n1.<-margin.table(obs3,1)[1] n<- margin.table(obs3) n.1<-margin.table(obs3,2)[1] p.right<-phyper(NFE.calc-1,n1.,n-n1.,n.1,lower.tail=FALSE) p.left<-phyper(NFE.calc,n1.,n-n1.,n.1) p.right p.left if(p.right < p.left) {p.value1<-p.right ; NFE.left<-NFE.calc ; d.NFE.calc<-round(dhyper(NFE.calc, n1. ,n-n1., n.1),12) ; d.NFE.left<-Inf ; while(NFE.left >= 0 & d.NFE.left> d.NFE.calc) { NFE.left<-NFE.left - 1 ; d.NFE.left<- round(dhyper(NFE.left,n1.,n-n1.,n.1),12)} ;

Fisher test In details: n1.<-margin.table(obs3,1)[1] n<- margin.table(obs3) n.1<-margin.table(obs3,2)[1] p.right<-phyper(NFE.calc-1,n1.,n-n1.,n.1,lower.tail=FALSE) p.left<-phyper(NFE.calc,n1.,n-n1.,n.1) p.right p.left if(p.right < p.left) {p.value1<-p.right ; NFE.left<-NFE.calc ; d.NFE.calc<-round(dhyper(NFE.calc, n1. ,n-n1., n.1),12) ; d.NFE.left<-Inf ; while(NFE.left >= 0 & d.NFE.left> d.NFE.calc) { NFE.left<-NFE.left - 1 ; d.NFE.left<- round(dhyper(NFE.left,n1.,n-n1.,n.1),12)} ; if(d.NFE.left> d.NFE.calc){p.value2<-0} else{p.value2<- phyper(NFE.left,n1.,n-n1.,n.1)}} else{p.value1<-p.left ; NFE.right<-NFE.calc ; d.NFE.calc<-round(dhyper(NFE.calc, n1. ,n-n1., n.1),12) ; d.NFE.right<-Inf ; while(d.NFE.right > d.NFE.calc){ NFE.right<-NFE.right + 1 ; d.NFE.right<- round(dhyper(NFE.right,n1.,n-n1.,n.1),12)} ; p.value2<- phyper(NFE.right-1,n1.,n-n1.,n.1,lower.tail=FALSE)}

Fisher test In details: n1.<-margin.table(obs3,1)[1] n<- margin.table(obs3) n.1<-margin.table(obs3,2)[1] p.right<-phyper(NFE.calc-1,n1.,n-n1.,n.1,lower.tail=FALSE) p.left<-phyper(NFE.calc,n1.,n-n1.,n.1) p.right p.left if(p.right < p.left) {p.value1<-p.right ; NFE.left<-NFE.calc ; d.NFE.calc<-round(dhyper(NFE.calc, n1. ,n-n1., n.1),12) ; d.NFE.left<-Inf ; while(NFE.left >= 0 & d.NFE.gauche > d.NFE.calc) { NFE.left<-NFE.left - 1 ; d.NFE.left<- round(dhyper(NFE.left,n1.,n-n1.,n.1),12)} ; if(d.NFE.left > d.NFE.calc){p.value2<-0} else{p.value2<- phyper(NFE.left,n1.,n-n1.,n.1)}} else{p.value1<-p.left ; NFE.right<-NFE.calc ; d.NFE.calc<-round(dhyper(NFE.calc, n1. ,n-n1., n.1),12) ; d.NFE.right<-Inf ; while(d.NFE.right > d.NFE.calc){ NFE.right<-NFE.right + 1 ; d.NFE.right<- round(dhyper(NFE.right,n1.,n-n1.,n.1),12)} ; p.value2<- phyper(NFE.right-1,n1.,n-n1.,n.1,lower.tail=FALSE)} p.value<-p.value1+p.value2 p.value

Fisher test In details: Pn11<-choose(n1.,n11)*choose(n-n1.,n.1-n11)/choose(n,n.1) Pn11 dhyper(n11,n1.,n-n1.,n.1) n11<-obs3[1,1]

Fisher test In details: Pn11<-choose(n1.,n11)*choose(n-n1.,n.1-n11)/choose(n,n.1) Pn11 dhyper(n11,n1.,n-n1.,n.1) n11<-obs3[1,1] # Test in R fisher.test(obs3)

Student t test Aim: Comparison of twoobservedmeansm1 and m2

Student t test Aim: Comparison of twoobservedmeansm1 and m2 Measured variable: A quantitative variable and a qualitative variable withtwo classes

Student t test Aim: Comparison of twoobservedmeansm1 and m2 Measured variable: A quantitative variable and a qualitative variable withtwo classes Conditions of utilization: The quantitative variable must follow a normal law The quantitative variable maybecontinuous or discrete

Student t test Aim: Comparison of twoobservedmeansm1 and m2 Measured variable: A quantitative variable and a qualitative variable withtwo classes Conditions of utilization: The quantitative variable must follow a normal law The quantitative variable maybecontinuous or discrete Test hypotheses: H0: μ1 = μ2 Means are identical in the target pop. H1 bilat: μ1≠ μ2 Means are differentin the target pop. H1 unilat right: μ1> μ2 Meanissrtictlysuperior to the mean in the target pop. H1 unilatleft: μ1< μ2Meanissrtictlyinferiorto the mean in the target pop.

Student t test Aim: Comparison of twoobservedmeansm1 and m2 Measured variable: A quantitative variable and a qualitative variable withtwo classes Conditions of utilization: The quantitative variable must follow a normal law The quantitative variable maybecontinuous or discrete Test hypotheses: H0: μ1 = μ2 Means are identical in the target pop. H1 bilat: μ1≠ μ2 Means are differentin the target pop. H1 unilat right: μ1> μ2 Meanissrtictlysuperior to the mean in the target pop. H1 unilatleft: μ1< μ2Meanissrtictlyinferiorto the mean in the target pop. The statisticis: with: In R: (m1-m2)/(s2*(1/n1+1/n2))^0.5

Student t test In details: obs1<-data.frame(Ceramics[which(Ceramics$Base=="Round" | Ceramics$Base=="Flat"), c(2,13)]) obs2<-na.omit(obs1) nc.max<-max(table(obs2[,2])) nb.na<-nc.max- table(obs2[,2]) tempo<-split(obs2[,1], obs2[,2]) for(i in 1:length(tempo)) {tempo[[i]]<-append(tempo[[i]],rep(NA,nb.na[i]))} obs3<-data.frame(tempo) obs3

Student t test In details: obs1<-data.frame(Ceramics[which(Ceramics$Base=="Round" | Ceramics$Base=="Flat"), c(2,13)]) obs2<-na.omit(obs1) nc.max<-max(table(obs2[,2])) nb.na<-nc.max- table(obs2[,2]) tempo<-split(obs2[,1], obs2[,2]) for(i in 1:length(tempo)) {tempo[[i]]<-append(tempo[[i]],rep(NA,nb.na[i]))} obs3<-data.frame(tempo) obs3 n1<-length(na.omit(obs3[, 1])) n2<- length(obs3[, 2]) m1<-mean(na.omit(obs3[, 1])) m2<-mean(obs3[, 2]) s.1<- sd(na.omit(obs3[, 1])) s.2<- sd(obs3[, 2]) param <- data.frame(c(n1, n2), c(m1, m2), c(s.1, s.2)) names(param) <- c("Effectives", "Mean", "Standard deviation") row.names(levels(obs2[,2])) param

Student t test In details: s2<-((n1-1)*s.1^2+(n2-1)*s.2^2)/(n1+n2-2) t.calc<- (m1-m2)/(s2*(1/n1+1/n2))^0.5 t.calc In details:

Student t test In details: s2<-((n1-1)*s.1^2+(n2-1)*s.2^2)/(n1+n2-2) t.calc<- (m1-m2)/(s2*(1/n1+1/n2))^0.5 t.calc nu<-n1+n2-2 nu min(pt(t.calc, nu, lower.tail=FALSE), pt(t.calc, nu))*2 In details:

Student t test In details: s2<-((n1-1)*s.1^2+(n2-1)*s.2^2)/(n1+n2-2) t.calc<- (m1-m2)/(s2*(1/n1+1/n2))^0.5 t.calc nu<-n1+n2-2 nu min(pt(t.calc, nu, lower.tail=FALSE), pt(t.calc, nu))*2 # Test in R t.test(obs3[, 1],obs3[, 2],var.equal=TRUE) In details:

Analysis of variance (ANOVA) Aim: Comparison of at least twoobservedmeans

Analysis of variance (ANOVA) Aim: Comparison of at least twoobservedmeans Measured variable: A quantitative variable and a qualitative variable withk classes

Analysis of variance (ANOVA) Aim: Comparison of at least twoobservedmeans Measured variable: A quantitative variable and a qualitative variable withk classes Conditions of utilization: The quantitative variable must follow a normal law The variances of the quantitative variable in each classes of the qualitative variable must beequal () -> If conditions are not fulfilled, see the Kruskal-Wallis test

Analysis of variance (ANOVA) Aim: Comparison of at least twoobservedmeans Measured variable: A quantitative variable and a qualitative variable withk classes Conditions of utilization: The quantitative variable must follow a normal law The variances of the quantitative variable in each classes of the qualitative variable must beequal () -> If conditions are not fulfilled, see the Kruskal-Wallis test Test hypotheses: H0: μ1 = μ2 Means are identical in the target pop. H1 bilat: μ1≠ μ2 One of the meansat least isdifferentin the target pop.

Analysis of variance (ANOVA) Aim: Comparison of at least twoobservedmeans Measured variable: A quantitative variable and a qualitative variable withk classes Conditions of utilization: The quantitative variable must follow a normal law The variances of the quantitative variable in each classes of the qualitative variable must beequal () -> If conditions are not fulfilled, see the Kruskal-Wallis test Test hypotheses: H0: μ1 = μ2 Means are identical in the target pop. H1 bilat: μ1≠ μ2 One of the meansat least isdifferentin the target pop. The statisticis: In R: (m1-m2)/(s2*(1/n1+1/n2))^0.5

Analysis of variance (ANOVA) In details: Ceram<-read.table("K:/Cours/Philippines/Statistics210/Data/Ceramics.txt",header=TRUE) obs1<-Ceram[,c(7,10)] obs1[,2]<-factor(obs1[,2]) obs2<-na.omit(obs1) nc.max<-max(table(obs2[,2])) nb.na<-nc.max- table(obs2[,2]) tempo<-split(obs2[,1], obs2[,2]) for(i in 1:length(tempo)) {tempo[[i]]<-append(tempo[[i]],rep(NA,nb.na[i]))} obs3<-data.frame(tempo) obs3

Analysis of variance (ANOVA) In details: graphics.off() k<-nlevels(obs2[, 2]) stripchart(obs2[, 1]~obs2[, 2], method="jitter", jitter=0.1, vertical=FALSE, ylim=range(0.5, k+0.5), group.names=levels(obs2[, 2]), xlab= names(obs2)[1], ylab=names(obs2)[2], pch=16, cex=1.2) mc<-sapply(split(obs2[, 1], obs2[, 2]), mean) for(i in 1:k){segments(mc[i], i-0.25, mc[i], i+0.25, lwd=3, col=gray(0.5))}

Overview of some tests