560 likes | 737 Views
Unit 31: A Unified Perspective for Visual Display of Data (A work in progress). Give you more experience (lecture/lab) with display of data in R One of R’s biggest strengths but as a result complex Sample code for common designs Think about what you want to display
E N D
Unit 31: A Unified Perspective for Visual Display of Data (A work in progress)
Give you more experience (lecture/lab) with display of data in R • One of R’s biggest strengths but as a result complex • Sample code for common designs • Think about what you want to display • Understand what you can and cant conclude from graphs • Emphasis on raw and sample distributions; error bars and envelopes • This just scratches the surface • No clear consensus and our field is out of date
Some Options in R Base Package ggplot2 effects
General Principles Visually present our effect, the parameter estimate. Display information about the sampling distribution for that parameter estimate Display information about the distribution of raw scores Case analysis- outliers and influence Reader understands what you are presenting Consistent across typical designs and measurement strategies
The Examples Between subjects designs Two group, equal N Two group, unequal N Three group Three group with covariates One quantitative IV Mixed and within designs Two conditions Three conditions 2 IV: quantitative and two conditions
Two Group Equal N some(d) X Y 007 A 63.75985 034 A 29.04556 052 B 47.33331 091 B 65.85238 tapply(d$Y, d$X, 'length') A B 25 75 tapply(d$Y, d$X, 'mean') A B 41.61268 51.37365 tapply(d$Y, d$X, 'sd') A B 13.85891 15.04196 tapply(d$Y, d$X, 'se') A B 1.959946 2.127254
contrasts(d$X) = varContrasts(d$X, Type = 'POC', POCList = list(c(-1,1))) POC1 A -0.5 B 0.5 m = lm(Y ~ X, data = d) modelSummary(m) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 46.493 1.446 32.147 < 2e-16 *** XPOC1 9.761 2.893 3.375 0.00106 **
contrasts(d$X) = varContrasts(d$X, Type = 'dummy', RefLevel = 1) B_v_A A 0 B 1 m = lm(Y ~ X, data = d) modelSummary(m) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 41.613 2.045 20.345 < 2e-16 *** XB_v_A 9.761 2.893 3.375 0.00106 **
contrasts(d$X) = varContrasts(d$X, Type = 'dummy', RefLevel = 2) A_v_B A 1 B 0 m = lm(Y ~ X, data = d) modelSummary(m) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 51.374 2.045 25.118 < 2e-16 *** XA_v_B -9.761 2.893 -3.375 0.00106 **
p= data.frame(X= c('A', 'B')) • X • 1 A • 2 B • p= modelPredictions (m,p) • Predicted lwrupr se X • 1 41.61268 39.56737 43.65799 2.045311 A • 51.37365 49.32833 53.41896 2.045311 B
library(gplots) library(Hmisc) windows() par(lwd=3, cex = 1.5, font=2, cex.axis=1, font.axis=2, cex.lab =1.5, font.lab=2) barplot2(p$Predicted, beside = TRUE, ylim = c(0,100), xlab = '', ylab = '', plot.ci =FALSE, axes=FALSE, col= 'white')
axis(2, at=seq(0,100,by=25),lwd=3) mtext('Dependent Measure (units)', side=2, line=2, cex=1.5) mtext('Group', side=1, line=3, cex=1.5)
x = jitter(rep(0,sum(d$X=='A')),2) + 0.7 points(x, d$Y[d$X=='A'], pch=20, cex = .5, col = 'gray') x = jitter(rep(0,sum(d$X=='A')),2) + 1.9 points(x, d$Y[d$X=='B'], pch=20, cex = .5, col = 'gray')
errbar(x=c(0.7, 1.9), y=p$Predicted, p$CIHi, p$CILo, pch=NA_integer_, lwd=3, cap= .05, add=TRUE )
lines(x=c(0.7,1.9),y=c(75,75), lwd=2) text(x=1.3, y=78,'**', cex=1.5)
Figure Caption: Bars represent sample means of dependent measure by group. Confidence interval bands (+1 standard error of point estimates from GLM) are provided to indicate the precision of the point estimates of the population group means. Dependent measure raw scores are presented by group as gray points. Horizontal line indicates significant contrast between group means (** p < .01)
What other error bars might you have put on the graph instead of the standard error of the point estimates?
POC Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 46.493 1.446 32.147 < 2e-16 *** XPOC1 9.761 2.893 3.375 0.00106 ** Dummy (A as reference) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 41.613 2.045 20.345 < 2e-16 *** XB_v_A 9.761 2.893 3.375 0.00106 ** Dummy (B as reference) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 51.374 2.045 25.118 < 2e-16 *** XA_v_B -9.761 2.893 -3.375 0.00106 **
t = summary(m) t$coefficients Estimate Std. Error t value Pr(>|t|) (Intercept) 46.493 1.446 32.147 < 2e-16 *** XPOC1 9.761 2.893 3.375 0.00106 ** errbar(x=c(0.7, 1.9), y=p$Predicted, p$Predicted + t$coefficients[2,2], p$Predicted - t$coefficients[2,2], pch=NA_integer_, lwd=3, cap= .05, add=TRUE )
Two Group Unequal N describeBy(d,d$X) #an alternatve to tapply used earlier group: A var n mean sd median trimmed mad min max range skew kurtosis se X* 1 25 1.00 0.00 1.00 1.00 0.00 1 1.00 0.00 NaN NaN 0.00 Y 2 25 37.72 15.51 34.87 36.77 10.77 10 81.11 71.12 0.88 0.62 3.10 ----------------------------------------------------------------------------------- group: B var n mean sd median trimmed mad min max range skew kurtosis se X* 1 75 2.00 0.00 2.00 2.0 0.00 2.00 2.00 0.00 NaN NaN 0.00 Y 2 75 48.84 13.38 47.52 48.8 14.79 18.85 81.98 63.13 0.11 -0.38 1.55
POC Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.279 1.609 26.894 < 2e-16 *** XPOC1 11.122 3.218 3.456 0.000813 *** Dummy (A as reference) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.718 2.787 13.532 < 2e-16 *** XB_v_A 11.122 3.218 3.456 0.000813 *** Dummy (B as reference) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 48.840 1.609 30.350 < 2e-16 *** XA_v_B -11.122 3.218 -3.456 0.000813 ***
Three Group Equal N tapply(d$Y, d$X, 'length') A B C 50 50 50 tapply(d$Y, d$X, 'mean') A B C 40.56729 44.34783 55.38463 tapply(d$Y, d$X, 'sd') A B C 13.67520 16.14149 13.78084 tapply(d$Y, d$X, 'se') A B C 1.933965 2.282752 1.948905
POC Estimate Std. Error t value Pr(>|t|) (Intercept) 46.767 1.190 39.293 < 2e-16 *** XPOC1 3.781 2.915 1.297 0.197 XPOC2 12.927 2.525 5.120 9.43e-07 *** Dummy (A as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 40.567 2.062 19.678 < 2e-16 *** XB_v_A 3.781 2.915 1.297 0.197 XC_v_A 14.817 2.915 5.082 1.12e-06 *** Dummy (B as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 44.348 2.062 21.512 < 2e-16 *** XA_v_B -3.781 2.915 -1.297 0.196752 XC_v_B 11.037 2.915 3.786 0.000223 *** Dummy (C as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 55.385 2.062 26.866 < 2e-16 *** XA_v_C -14.817 2.915 -5.082 1.12e-06 *** XB_v_C -11.037 2.915 -3.786 0.000223 ***
p= data.frame(X= c('A', 'B', 'C')) • X • 1 A • 2 B • 3 C • p= modelPredictions(m,p) • Predicted lwrupr se • 1 40.56729 38.50579 42.62880 2.061505 • 2 44.34783 42.28632 46.40933 2.061505 • 55.38463 53.32312 57.44613 2.061505
windows() par(lwd=3, cex = 1.5, font=2, cex.axis=1, font.axis=2, cex.lab =1.5, font.lab=2) barplot2(p$Predicted, beside = TRUE, ylim = c(0,100), xlab = '', ylab = '', plot.ci =FALSE, axes=FALSE, col= 'white')
axis(2, at=seq(0,100,by=25),lwd=3) mtext('Dependent Measure (units)', side=2, line=2, cex=1.5) mtext('Group', side=1, line=3, cex=1.5)
x = jitter(rep(0,sum(d$X=='A')),2) + 0.7 points(x, d$Y[d$X=='A'], pch=20, cex = .5, col = 'gray') x = jitter(rep(0,sum(d$X=='A')),2) + 1.9 points(x, d$Y[d$X=='B'], pch=20, cex = .5, col = 'gray') x = jitter(rep(0,sum(d$X=='A')),2) + 3.1 points(x, d$Y[d$X=='C'], pch=20, cex = .5, col = 'gray')
errbar(x=c(0.7,1.9,3.1), y=p$Predicted, p$CIHi, p$CILo, pch=NA_integer_, lwd=3, cap= .05, add=TRUE )
lines(x=c(0.7, 3.1),y=c(70,70), lwd=2) text(x=1.9, y=73,'***', cex=1.5) lines(x=c(1.9,3.1),y=c(80,80), lwd=2) text(x=2.5, y=83,'***', cex=1.5)
Figure Caption: Bars represent sample means of dependent measure by group. Confidence interval bands (+1 standard error of point estimates from GLM) are provided to indicate the precision of the point estimates of the population group means. Dependent measure raw scores are presented by group as gray points. Horizontal line indicates significant contrast between group means (*** p < .001)
What other error bars might you have put on the graph instead of the standard error of the point estimates?
POC Estimate Std. Error t value Pr(>|t|) (Intercept) 46.767 1.190 39.293 < 2e-16 *** XPOC1 3.781 2.915 1.297 0.197 XPOC2 12.927 2.525 5.120 9.43e-07 *** Dummy (A as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 40.567 2.062 19.678 < 2e-16 *** XB_v_A 3.781 2.915 1.297 0.197 XC_v_A 14.817 2.915 5.082 1.12e-06 *** Dummy (B as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 44.348 2.062 21.512 < 2e-16 *** XA_v_B -3.781 2.915 -1.297 0.196752 XC_v_B 11.037 2.915 3.786 0.000223 *** Dummy (C as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 55.385 2.062 26.866 < 2e-16 *** XA_v_C -14.817 2.915 -5.082 1.12e-06 *** XB_v_C -11.037 2.915 -3.786 0.000223 ***
Three Group with Two Covariates some(d) X CQ CD Y 007 A 100.35521 1 143.1154 029 A 98.86410 1 121.2442 057 B 98.99465 0 130.4698 070 B 109.54382 0 166.6895 132 C 105.21880 0 146.7784 tapply(d$Y, d$X, 'mean') A B C 124.3091 131.0999 143.5009
str(d) 'data.frame': 100 obs. of 2 variables: $ X: num 46.7 35.3 33 57 36.1 ... $ Y: num 32.1 19.8 26.7 63.1 53.9 ... varDescribe(d,1) n mean sd min max X 100 51.24 9.85 32.98 80.97 Y 100 45.90 15.82 11.13 93.24
m = lm(Y ~ X, data = d) modelSummary(m) Coefficients Estimate SE t-statistic Pr(>|t|) (Intercept) 17.3968 7.9405 2.191 0.030829 * X 0.5563 0.1522 3.655 0.000416 *** --- Sum of squared errors (SSE): 21795.3, Error df: 98 R-squared: 0.1200 48
p = data.frame(X = seq(33,80,.01)) p = modelPredictions(m,p) plot(x=c(30,90),y=c(0,100), type='n', xlab = '', ylab = '', axes=FALSE, frame.plot=FALSE) axis(1, lwd=3, at=seq(30,90, by=10), cex.axis=1) mtext('Variable X', side=1, line=3, cex=1.5) axis(2, lwd=3, at=seq(0,100, by=25), cex.axis=1) mtext(expression(bold(paste('Variable Y (', mu, 'V)', sep=''))), side=2, line=2, cex=1.5) points(d$X,d$Y, cex=.5) #Draw new polygon shaded confidence bands with transparency. NOTE: Bands drawn before prediction lines in case of overlap polygon(c(p$X, rev(p$X)), c(p$CILo, rev(p$Predicted)),col = (rgb(1, 0, 0,.25)), border = NA) polygon(c(p$X, rev(p$X)), c(p$CIHi, rev(p$Predicted)),col = (rgb(1, 0, 0,.25)), border = NA) #Draw confidence bands as lines instead of region. NOTE: Bands drawn before prediction lines in case of overlap #lines(x=p$X,y=p$CILo, type='l', lty=1, col='gray', lwd=1) #lines(x=p$X,y=p$CIHi, type='l', lty=1, col='gray', lwd=1) lines(x=p$X,y=p$Predicted, type='l', lty=1, col='black', lwd=3) Coefficients Estimate SE t-statistic Pr(>|t|) (Intercept) 17.3968 7.9405 2.191 0.030829 * X 0.5563 0.1522 3.655 0.000416 *** --- Sum of squared errors (SSE): 21795.3, Error df: 98 R-squared: 0.1200
points(d$X,d$Y, cex=.5) polygon(c(p$X, rev(p$X)), c(p$CILo, rev(p$Predicted)), col = (rgb(1, 0, 0,.25)), border = NA) polygon(c(p$X, rev(p$X)), c(p$CIHi, rev(p$Predicted)), col = (rgb(1, 0, 0,.25)), border = NA) lines(p$X,p$Predicted, type='l', lty=1, col='black', lwd=3)