280 likes | 455 Views
DC R Meetup Predictive Analytics World October 19th, 2010 Harlan D. Harris, PhD harlan@harris.name. how to speak ggplot2 like a native. ggplot's philosophy. Graphics are (should be!) created by combining a specification with data . (Wilkinson, 2005)
E N D
DC R Meetup Predictive Analytics World October 19th, 2010 Harlan D. Harris, PhD harlan@harris.name how to speak ggplot2 like a native
Harlan D. Harris, PhD ggplot's philosophy • Graphics are (should be!) created by combining a specification with data. (Wilkinson, 2005) • The specification is not the name of the visual form (bar graph, scatterplot, histogram). • The specification is a collection of rules that together describe how to build a graph, a Grammar of Graphics
Harlan D. Harris, PhD data date ct sz z x=date y=ct/szbars group by z graphics as grammar me
Harlan D. Harris, PhD advantages • Flexible • can define new graph types by changing specifications • can combine many forms into single graphs • Smart • compact: rules have useful defaults • graphs always have meaning • Reusable • can plug new data into old specification • can explore many types of plots from a set of data
Harlan D. Harris, PhD ggplot2 • Hadley Wickham (Rice Univ.) • also: reshape2, plyr, etc. • Extends & implements The Grammar of Graphics(Wilkinson, 1995, 2005) • Focus on layers; based on grid • Specification as R objects constructed by functions • Large library of components with good defaults • ggplot2: Elegant Graphics for Data Analysis (Wickham, 2009)
Harlan D. Harris, PhD my gripes • Specification is hierarchical structure;grammar is left-to-right R expression;graph is spatial • Can't see the structure (usefully) • Abuses both notation and R semantics • Deep Magic with lazy evaluation, proto objects • Existing tutorials lead to conceptual confusion, requires relearning of fundamentals • Start with the structure, not with the shortcuts
Harlan D. Harris, PhD data to plot
Harlan D. Harris, PhD ggplot likes “long” data
Harlan D. Harris, PhD will plot model vs. empirical
Harlan D. Harris, PhD aes=”aesthetics”=”create mapping” simplest plot
ggplot data layers mapping scales coords facets options x=Param.y=Errs color=Cond. (copy) layer[1] data mapping geom stat geom_ params stat_ params identity line you don't need to know this! structure ggplot(data=d.long.EI, mapping=aes(x=Parameter, y=Errors, color=Condition)) + layer(geom="line") • structure(p), str(p) Ø Harlan D. Harris, PhD
Harlan D. Harris, PhD add empirical data and chance
ggplot data layers mapping scales coords facets options layer[1] data mapping geom stat geom_params stat_ params layer[1] data mapping geom stat geom_params stat_ params layer[1] data mapping geom stat geom_params stat_ params layer[1] data mapping geom stat geom_params stat_ params you don't need to know this! structure so far x=Param.y=Errs color=Cond. (copy) line identity (U) point identity size=3 (K) hline hline yint=Errs size=2 color=”black” linetype=2 size=.5 hline hline yint=[64] Harlan D. Harris, PhD
Harlan D. Harris, PhD scales
Harlan D. Harris, PhD coordinates & scales • coordinates affect display of axes • cartesian, polar, map, etc. • scales affect data mapping • colors, shapes, lines • source of confusion • set axis ticks/breaks and labels with scale_x_continuous() or scale_y_discrete(), but • restrict DATA range with scale_*(limits=c(1,10))restrict AXIS (plotted) range with coord_cartesian(xlim=c(1,10))
Harlan D. Harris, PhD options
Harlan D. Harris, PhD shortcuts • All those layer() calls are tedious! • geom_*() creates a layer with a specific geom (and various defaults, including a stat) • stat_*() creates a layer with a specific stat(and various defaults, including a geom) • qplot() creates a ggplot and a layer
Harlan D. Harris, PhD quick note on stats • stat=”identity” • stat=”lm” • fit y=f(x) with lm(), generate new data to be plotted by geom_line(), CIs with geom_ribbon() • stat=”smooth” • fit y=f(x) with loess() • stat=”summary” • y=f(x) with arbitrary f() • stat=”bin” • histograms
Harlan D. Harris, PhD simplest faceted plot
Harlan D. Harris, PhD everything else (+alpha)
Harlan D. Harris, PhD other things I find useful • scale_x_continuous(breaks=seq(1,9,2), labels=c(“one”, “”, “five”, “”, “nine”)) • geom_text(aes(x=.., y=.., label=..)) • annotate(geom=”text”, x=14, y=19, “outlier!”) • geom_density() • stat_summary(fun.data=”mean_cl_boot”, geom=”crossbar”) • geom_jitter(position=position_jitter(width=.5))
“fizzy bubbly” plot • rated.movies <- subset(movies, mpaa!=“”) • rated.movies$mpaa <- factor(rated.movies$mpaa) • p <- ggplot(rated.movies, aes(mpaa, rating)) + geom_jitter(alpha=.5) + stat_summary(fun.data= “mean_sdl”, geom=“crossbar”, color=“red”, size=1) • ggsave(“movies.png”, p, dpi=150)
Harlan D. Harris, PhD takehomes • a ggplot graph is generated by a specification + data • ggplot specifications are a core object plus layers • mappings among data, x/y, scales, and other attributes are fundamental • geom and stat shortcuts allow smart/compact construction of graphs • ggplot encourages good graphs, with facets, good use of color, minimal chartjunk
Harlan D. Harris, PhD resources • Wickham, H. (2009) ggplot2: Elegant Graphics for Data Analysis. Springer. • http://had.co.nz/ggplot2/ • http://groups.google.com/group/ggplot2 • http://stackoverflow.com/questions/tagged/r • http://github.com/hadley/ggplot2/wiki
Harlan D. Harris, PhD thanks!