360 likes | 530 Views
Bubble Plots as a Model-Free Graphical Tool for Three Continuous Variables and a Flexible R Function to Plot Them Keith A. Markus and Wen Gu John Jay College of Criminal Justice, CUNY. Overview. Goal: Model-free graphs for 3 continuous variables. Some alternative graphs & design issues.
E N D
Bubble Plots as a Model-Free Graphical Tool for Three Continuous Variablesand a Flexible R Function to Plot ThemKeith A. Markus and Wen GuJohn Jay College of Criminal Justice, CUNY
Overview • Goal: Model-free graphs for 3 continuous variables. • Some alternative graphs & design issues. • The R function: bp3way(). • An empirical study. • Tentative conclusions & future directions.
The Goal • The goal is to provide a useful graphical representation of the association between 3 continuous variables. • Often: 2 IVs and 1 DV. • Model free: • Exploratory data analysis. • Not a summary of a statistical model.
Why Model Free? • If the statistical model is correct: model based graphs can be very efficient. • If the statistical model is incorrect: model based graphs can be very misleading. • E.g., Multiple y~x regression lines for values of z. Misleading if... • y~x relationship is not linear. • Variance in y varies with x or z. • Regression lines extrapolate beyond data.
Some Non-Options • Scatterplot matrix. • y~x regression lines for fixed z values. • Factorial design type line plots. • All good plots for other applications. • But not good plots for present purpose.
Scatterplot matrix • Does not attempt to represent 3-way distributions. • Same data used for all graphs (N = 100)
y~x regression lines for fixed values of z: • Model dependent: plots model not data. • Not clear where data leaves off.
Factorial-design type plots for categorized IVs: • Model dependent (interpolation). • Arbitrary cuts (quartiles plotted here). • Loss of information through categorization.
Some Options • 3D Scatterplots. • R Package scatterplot3d: scatterplot3() • Co-plots. • R base installation: coplot() • 3-way Bubbleplots. • Available from authors: bp3way()
3D scatterplot: • Natural extension of 2D scatter plot. • Relies on 3D illusion: some ambiguity.
Co-plot • Well suited to perceptual process. • Relies on banding of z values.
3-Way Bubble Plot • 2D representation of 3D data. • People tend to underestimate area. • No literature.
Some Design Features of the 3-Way Bubble Plot • Grid designed to make it easier to compare circle sizes across the plot surface. • Shading designed to accentuate bubbles. • Limited number of cases plotted avoids overly dense plots (in this case all 100 are plotted). • Margins avoid bubbles extending outside plot region.
bp3way() function Usage bp3way(x) bp3way(x, xc=1, bc=2, yc=3, proportion=1, random=TRUE, x.margin=.1, y.margin=.1, rad.ex=1, rad.min=NULL, names=c('X', 'B', 'Y'), std=FALSE, fg='black', bg='grey90', tacit=TRUE, ...)
Data Parameters x is a data frame with at least 1 column. xc, yc, and bc identify the columns used to plot the x axis, y axis, and bubbles respectively. names is a vector of variables names used in the plot. • Easy to switch variables without changing the data. • User can use same column more than once. • Out of bounds values return an error.
Parameters with data sensitive defaults: • rad.ex: Radius expansion rate. • rad.min: Minimum bubble radius. • proportion: % of data plotted. • margins and grid. • Other user-specified options include: • Plotting a random sample or first % of cases. • Standardization of X and Y variables. • labels and colors.
Empirical Study • 3 Plots (Bubbleplot, 3D Scatterplot, Coplot). • Between subjects. • Within group n = 36. • 6 Data sets. • Within subjects. • N of subjects = 108. • N of observations = 108 x 6 = 648.
Four DVs • Accuracy of interpretation of graphs • 0-3 questions answered correctly. • Confidence in interpretation • 1-5, average of 3 1-5 Likert scale items. • Perceived clarity • 1-5 Likert scale item. • Perceived ease of use • 1-5 Likert scale item.
Univariate Summary • No floor or ceiling effects, variability in DVs.
Correlations Between Outcomes • Above Diagonal: N = 648 observations. • Below Diagonal: N = 108 participants.
Multivariate model fit first y* = α0 + α1'Data + α2' Data∙Graph + u1 (Level 1) α 0 = β0 + β1'Graph + u2 (Level 2) y = { 0 if y* ≤ τ1, 1 if τ 1 < y* ≤ τ 2, ... k if τ k-1 < y* ≤ τ k} (Threshold model) • Third equation not used for confidence DV. • Full model: Mplus • Confidence also fit in R using lme() function. • Nearly identical estimates with R or Mplus. • Story in interactions, not main effects.
Follow-up: Simple Effects • Shift focus to simple effects because we cannot usefully interpret interactions. • Protected Wilcox Mann Whitney Exact Tests Used for Accuracy, Clarity and Ease of Use DVs. • Protected t tests used for Confidence DV. • No one graph consistently better. • Mostly a story about accuracy.
Tentative Conclusions • Much remains to be learned about the cognition of these 3 graph types. • Coplot may have a slight edge over the other two. • But optimal plot seems data dependent. • Study included a limited range of data and graph conditions. • More detailed perceptual theory is needed to optimize graph design.
Recommendation for exploratory analysis: • Use 2 or more graph types. • Cannot predict ahead of time which will work best. • Probably useful to look at data more than one way even if one graph were consistently best.
Recommendation for reporting results: • Use model based graphs. • If you understand your data well enough to fit a good model. • If not, try different model-free graphs and see which seems to work best.
Future Directions • Identify factors that impact which graph works best. • Identify design factors that maximize effectiveness of all 3 graph types. • Increase statistical power: • Identify individual difference covariates that account for within condition variance. • More sensitive outcome measures.