320 likes | 384 Views
Learn the basics of ggplot2, a powerful data visualization package in R. This session covers basic plotting techniques, saving plots, data preprocessing, and formatting options. Use ggplot2 to construct plots with different elements such as points, lines, and areas. Explore the logic and structure of ggplot2, and discover helpful resources for further exploration.
E N D
R topics 2 Second session Markus Bonsch and Peter Paul Pichler 12.03.2014
Topics of today You have the following libraries: ggplot2, reshape2, luplot, lusweave • Basic plotting with ggplot2 • Saving plots • Data preprocessing
ggplot2 • ggplot2 offers great plotting tools • The logic is fundamentally different from “normal” R plotting • It is structured in an additive way: You can construct your basic plot structure and then add different elements (points, lines, areas,…) • Formatting (nice colors, text formatting, …) can be done at any time by “adding” the formatting options
ggplot2 help • Important page: http://www.cookbook-r.com/Graphs/ • Google: “ggplot [question]” is very helpful. Will often lead you to the stackoverflow webpage which is very good.
ggplot2 general procedure • Get your input data into the appropriate format (data frame) • Define the basic structure of the plot (aesthetics) • Add elements like lines, points, bars, … (geoms) • Format everything in a nice way (scales, theme, facets)
ggplot2 a complete example • Load prepared input data: load(“dat1.rda”)str(dat1) 'data.frame': 48 obs. of 4 variables: $ Region : Factor w/ 4 levels "Africa","Latin_America",..: 1 2 3 4 1 2 3 4 1 2 ... $ Year : Factor w/ 6 levels "2000","2010",..: 1 1 1 1 2 2 2 2 3 3 ... $ Scenario: Factor w/ 2 levels "Scenario1","Scenario2": 1 1 1 1 1 1 1 1 1 1 ... $ Value : int 1 2 3 4 5 6 7 8 9 10 ...
ggplot2 a complete example ggplot(dat1,aes(x=Year,y=Value)) ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point()
ggplot2 a complete example ggplot(dat1,aes(x=Year,y=Value)) ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() + facet_grid(.~Scenario) ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() + facet_grid(.~Scenario)
ggplot2 a complete example ggplot(dat1,aes(x=Year,y=Value)) ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() + facet_grid(.~Scenario) ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() + facet_grid(.~Scenario) + scale_color_manual(values=c(“red”,”green”,”blue”,”black”)) ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() + facet_grid(.~Scenario) + scale_color_manual(values=c(“red”,”green”,”blue”,”black”))
ggplot2 a complete example ggplot(dat1,aes(x=Year,y=Value)) ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() + facet_grid(.~Scenario) ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() + facet_grid(.~Scenario) + scale_color_manual(values=c(“red”,”green”,”blue”,”black”)) ggplot(dat1,aes(x=Year,y=Value,colour=Region)) + geom_point() + facet_grid(.~Scenario) + scale_color_manual(values=c(“red”,”green”,”blue”,”black”)) + theme(text=element_text(color=“red”, size=20))
ggplot2 data preparation Sometimes it happens that data is provided as an array: load(“dat1_array.rda”) str(dat1_array) The reshape2 library helps to convert it to a dataframe library(reshape2) dat1_new<-melt(dat1_array) str(dat1_new) str(dat1)
ggplot2 data preparation Data may be provided as a data frame with an unpleasant structure load(“dat1_df.rda”) str(dat1_df) How to convert that to a structure similar to dat1? dat1_new<-melt(dat1_df, id=c("Region","Year"),variable.name="Scenario") str(dat1_new) str(dat1)
ggplot2 setting up the plot: ggplot() ggplot(dat1,aes(x=Year,y=Value)) General structure: ggplot(data,aes(…)) Aesthetics (aes) map properties of the plot to dimensions of the data (x axis, y axis, color, size, …) ggplot(dat1,aes(x=Year,y=Value)) + geom_point() ggplot(dat1,aes(x=Region,y=Value)) + geom_point() ggplot(dat1,aes(x=Year,y=Value,color=Region,size=Scenario)) + geom_point()
ggplot2 setting up the plot: ggplot() ggplot(dat1,aes(x=Year,y=Value)) Grouping Grouping of data entries is important especially for line plots #Not meaningful ggplot(dat1,aes(x=Year,y=Value)) + geom_point()+geom_line() #group to tell ggplot which points belong together ggplot(subset(dat1,Scenario==“Scenario1”),aes(x=Year,y=Value,group=Region,color=Region))+geom_line() The interaction command allows to group more dimensions ggplot(dat1,aes(x=Year,y=Value, ,color=Region,linetype=Scenario, group=interaction(Region,Scenario))) + geom_point()+geom_line()
ggplot2 aesthetics ggplot(dat1,aes(x=Year,y=Value)) Different geoms may accept and need different aesthetics Aesthetics can also be set in the geom: ggplot(dat1,aes(x=Year,y=Value)) + geom_point(aes(color=Region,size=Scenario)) Aesthetics that apply to all dimensions at once are specified outside the aes argument in the call to geom: ggplot(dat1,aes(x=Year,y=Value,color=Region)) + geom_point(size=4,shape=3) Common aesthetics (there are many more) • x • Y • color color of lines • fill color of areas • shape shape of points • linetype appearance of • lines
ggplot2 storing ggplot(dat1,aes(x=Year,y=Value)) Plots can be stored in an object and modified afterwards. #create and store a plot plot1<-ggplot(dat1,aes(x=Year,y=Value)) #modify it plot1+geom_point() plot1<-plot1+geom_point() #display using the print command print(plot1)
ggplot2 geoms ggplot(dat1,aes(x=Year,y=Value)) geoms are the way to add different graphical representations of the data to the plot. There is a large variety of geoms available: http://sape.inf.usi.ch/quick-reference/ggplot2/geom In the following, we will show some examples based on two datasets: dat1 and dat2 (both available at the course website) load(“dat1.rda”) load(“dat2.rda”) str(dat2) #create basic plots plot2<-ggplot(subset(dat1,Scenario==“Scenario1”),aes(x=Year,y=Value,color=Region,fill=Region,group=Region)) plot3<-ggplot(dat2)
ggplot2 geoms ggplot(dat1,aes(x=Year,y=Value)) geom_point() plot2+geom_point() plot3+geom_point(aes(x=precipitation,y=temperature)) geom_line() plot2+geom_line(size=2) plot3+geom_line(aes(x=year,y=temperature,group=area,color=area)) geom_bar() #histogram plot2+geom_bar(aes(x=precipitation)) #bar plot plot1+geom_bar(stat="identity",position="dodge")
ggplot2 geoms ggplot(dat1,aes(x=Year,y=Value)) geom_area() plot2+geom_area() geom_smooth() plot3+geom_smooth(aes(x=year,y=temperature,group=area,color=area),method=“lm”) geom_density() plot3+geom_density(aes(x=precipitation, group=area, color=area, fill=area) ,alpha=0.7)
facets Facets allow to split the plot into different panels #Above each other plot2+geom_point(aes(x=year,y=precipitation))+facet_grid(area~.) #Besides each other plot2+geom_point(aes(x=year,y=precipitation))+facet_grid(.~area) #Splitting in two dimensions plot1+geom_point()+facet_grid(Region~Scenario)
ggplot2 formatting Scales Scales are used to format plot properties that have been assigned in the aesthetics. Can be used to set limits, change the name of the legend, change colors, set logscales, change axis titles.
ggplot2 formatting Color: plot3 + geom_line(aes(x=year,y=precipitation,color=area),size=2)+ scale_color_manual(values=c(“red”,”blue”,”green”),name=“test”) Many more ways to specify colors: scale_color_discrete scale_color_continuous scale_color_brewer (uses predefined color palettes, see http://colorbrewer2.org)
ggplot2 formatting Formatting x and y axes can be done via the following commands: • scale_x_continuous • scale_x_discrete • scale_x_date • scale_x_datetime • scale_x_log10 • scale_x_reverse • scale_x_sqrt Same for y
ggplot2 formatting #construct test plot dat2$test<-dat2$precipitation^ rnorm(n=10,mean=1,sd=10) plot4<-ggplot(dat2,aes(x=year,y=test,color=area))+ geom_point() #Setting axis titles plot4+scale_x_continuous(name="time") plot4+labs(x="time",y="rain") #setting limits plot4+scale_x_continuous(limits=c(1900,1950)) #defining breaks and labels plot4+scale_x_continuous(breaks=c(1900,1950,2000),labels=c("early","middle","late"))
ggplot2 formatting # set log scale dat2$test<-2^(dat2$year-1899) plot5<-ggplot(dat2,aes(x=year,y=test,color))+ geom_point() plot5+scale_y_log10()
ggplot2 formatting theme The „theme“ item is used for formatting of text, background and legend A list of all possible entries can be found here: http://docs.ggplot2.org/0.9.2.1/theme.html The structure is: theme(formatting1, formatting2,...) Formatting happens using element_text() for text and element_rect() for areas. element_blank() removes the element.
ggplot2 formatting theme #larger text and bold plot4<-plot4+theme(text=element_text(size=15,face=“bold”)) #white background plot4<-plot4+theme(panel.background=element_rect(fill=“white”,color=“black”)) #remove axis text plot4 + theme(axis.text=element_blank()) #Move legend to bottom and remove legend title plot4 + theme(legend.position="bottom", legend.title=element_blank())
ggplot2 saving plots Plots can be saved as pdf, png, eps, … using ggsave #save as pdf ggsave(filename="plotit.pdf",plot=plot4) #save as eps ggsave(filename="plotit.eps",plot=plot4) #many more possible #make smaller keeping aspect ratio ggsave(filename="plotit.pdf",plot=plot4,scale=0.5) #determine width and height manually ggsave(filename="plotit.pdf",plot=plot4,width=10,height=3)
ggplot2 saving plots At PIK we have developed a useful tool for displaying multiple plots in one pdf #open the connection sw<-swopen(outfile="test.pdf") #add first plot swfigure(sw,print,plot4,tex_caption="example plot number one") #add more plots swfigure(sw,print,plot1+geom_point(),tex_caption="example plot number two",fig.orientation="landscape") #create the pdf swclose(sw)
ggplot2 saving plots lusweave More elements can be added: • Tables • Latex code (sections, equations, …) The library can be obtained via SVN. See: http://redmine.pik-potsdam.de/projects/mo/wiki/Installing_landuse_library
ggplot2 maps #A spatially explicit test dataset (0.5 degree) load("dat3.rda") str(dat3) #Create basic map plot plot6<-ggplot(dat3,aes(x=long,y=lat))+geom_raster(aes(fill=Value))
ggplot2 That’s it, thanks