1 / 36

Overview

Bubble Plots as a Model-Free Graphical Tool for Three Continuous Variables and a Flexible R Function to Plot Them Keith A. Markus and Wen Gu John Jay College of Criminal Justice, CUNY. Overview. Goal: Model-free graphs for 3 continuous variables. Some alternative graphs & design issues.

maddox
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bubble Plots as a Model-Free Graphical Tool for Three Continuous Variablesand a Flexible R Function to Plot ThemKeith A. Markus and Wen GuJohn Jay College of Criminal Justice, CUNY

  2. Overview • Goal: Model-free graphs for 3 continuous variables. • Some alternative graphs & design issues. • The R function: bp3way(). • An empirical study. • Tentative conclusions & future directions.

  3. The Goal • The goal is to provide a useful graphical representation of the association between 3 continuous variables. • Often: 2 IVs and 1 DV. • Model free: • Exploratory data analysis. • Not a summary of a statistical model.

  4. Why Model Free? • If the statistical model is correct: model based graphs can be very efficient. • If the statistical model is incorrect: model based graphs can be very misleading. • E.g., Multiple y~x regression lines for values of z. Misleading if... • y~x relationship is not linear. • Variance in y varies with x or z. • Regression lines extrapolate beyond data.

  5. Some Non-Options • Scatterplot matrix. • y~x regression lines for fixed z values. • Factorial design type line plots. • All good plots for other applications. • But not good plots for present purpose.

  6. Scatterplot matrix • Does not attempt to represent 3-way distributions. • Same data used for all graphs (N = 100)

  7. y~x regression lines for fixed values of z: • Model dependent: plots model not data. • Not clear where data leaves off.

  8. Factorial-design type plots for categorized IVs: • Model dependent (interpolation). • Arbitrary cuts (quartiles plotted here). • Loss of information through categorization.

  9. Some Options • 3D Scatterplots. • R Package scatterplot3d: scatterplot3() • Co-plots. • R base installation: coplot() • 3-way Bubbleplots. • Available from authors: bp3way()

  10. 3D scatterplot: • Natural extension of 2D scatter plot. • Relies on 3D illusion: some ambiguity.

  11. Co-plot • Well suited to perceptual process. • Relies on banding of z values.

  12. 3-Way Bubble Plot • 2D representation of 3D data. • People tend to underestimate area. • No literature.

  13. Some Design Features of the 3-Way Bubble Plot • Grid designed to make it easier to compare circle sizes across the plot surface. • Shading designed to accentuate bubbles. • Limited number of cases plotted avoids overly dense plots (in this case all 100 are plotted). • Margins avoid bubbles extending outside plot region.

  14. bp3way() function Usage bp3way(x) bp3way(x, xc=1, bc=2, yc=3, proportion=1, random=TRUE, x.margin=.1, y.margin=.1, rad.ex=1, rad.min=NULL, names=c('X', 'B', 'Y'), std=FALSE, fg='black', bg='grey90', tacit=TRUE, ...)

  15. Data Parameters x is a data frame with at least 1 column. xc, yc, and bc identify the columns used to plot the x axis, y axis, and bubbles respectively. names is a vector of variables names used in the plot. • Easy to switch variables without changing the data. • User can use same column more than once. • Out of bounds values return an error.

  16. Data-sensitive Defaults Help Avoid Bad Plots

  17. Parameters with data sensitive defaults: • rad.ex: Radius expansion rate. • rad.min: Minimum bubble radius. • proportion: % of data plotted. • margins and grid. • Other user-specified options include: • Plotting a random sample or first % of cases. • Standardization of X and Y variables. • labels and colors.

  18. Empirical Study • 3 Plots (Bubbleplot, 3D Scatterplot, Coplot). • Between subjects. • Within group n = 36. • 6 Data sets. • Within subjects. • N of subjects = 108. • N of observations = 108 x 6 = 648.

  19. Four DVs • Accuracy of interpretation of graphs • 0-3 questions answered correctly. • Confidence in interpretation • 1-5, average of 3 1-5 Likert scale items. • Perceived clarity • 1-5 Likert scale item. • Perceived ease of use • 1-5 Likert scale item.

  20. Univariate Summary • No floor or ceiling effects, variability in DVs.

  21. Correlations Between Outcomes • Above Diagonal: N = 648 observations. • Below Diagonal: N = 108 participants.

  22. Multivariate model fit first y* = α0 + α1'Data + α2' Data∙Graph + u1 (Level 1) α 0 = β0 + β1'Graph + u2 (Level 2) y = { 0 if y* ≤ τ1, 1 if τ 1 < y* ≤ τ 2, ... k if τ k-1 < y* ≤ τ k} (Threshold model) • Third equation not used for confidence DV. • Full model: Mplus • Confidence also fit in R using lme() function. • Nearly identical estimates with R or Mplus. • Story in interactions, not main effects.

  23. Follow-up: Simple Effects • Shift focus to simple effects because we cannot usefully interpret interactions. • Protected Wilcox Mann Whitney Exact Tests Used for Accuracy, Clarity and Ease of Use DVs. • Protected t tests used for Confidence DV. • No one graph consistently better. • Mostly a story about accuracy.

  24. Accuracy Results

  25. Accuracy Results

  26. Confidence Results

  27. Confidence Results

  28. Perceived Clarity Results

  29. Perceived Clarity Results

  30. Perceived Ease of Use Results

  31. Perceived Ease of Use Results

  32. Tentative Conclusions • Much remains to be learned about the cognition of these 3 graph types. • Coplot may have a slight edge over the other two. • But optimal plot seems data dependent. • Study included a limited range of data and graph conditions. • More detailed perceptual theory is needed to optimize graph design.

  33. Recommendation for exploratory analysis: • Use 2 or more graph types. • Cannot predict ahead of time which will work best. • Probably useful to look at data more than one way even if one graph were consistently best.

  34. Recommendation for reporting results: • Use model based graphs. • If you understand your data well enough to fit a good model. • If not, try different model-free graphs and see which seems to work best.

  35. Future Directions • Identify factors that impact which graph works best. • Identify design factors that maximize effectiveness of all 3 graph types. • Increase statistical power: • Identify individual difference covariates that account for within condition variance. • More sensitive outcome measures.

More Related