1 / 41

R Graphics

R Graphics. Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University . Introduction to R. What is R A free “open-source” system for statistical computation and graphics

bridie
Download Presentation

R Graphics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R Graphics Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University

  2. Introduction to R • What is R • A free “open-source” system for statistical computation and graphics • Consists of a language (called R) plus a run-time environment with graphics, debugger, access to certain system functions, and the ability to run programs stored in script files • Influenced by S language, developed by Becker, Chamber, and Wilks at Bell Laboratories • S is a very high level language and an environment for data analysis and graphics • S-Plus, a commercial tool • Initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics of the University of Auckland in Auckland, New Zealand • Possible for the user to interface to procedures written in the C, C++, or FORTRAN languages for efficiency • Main Website of R • Http://cran.r-project.org/ (download Linux, MacOS X, and Windows)

  3. Start Up “run line or selection” button Script File Window Command Window • Two Alternatives to Run Commands in R • Command window (R Console window) • Script file (File >> New script) • Highlight the commands in the script file window and click the “run line or selection” button

  4. Read Files Read in Data from an External File Parameters: file = “ …” the name and directory of the file from which the data are to be read header = T the first row in the table of data includes the attribute names of the data sep = “…” the field separator character. (“\t” – means separation by tab; other common separators include “,” “;” and “ ”) na.strings = “…” specify the missing characters, which is NA by default read.csv( ) is identical to read.table except for the defaults. It is intended for reading ‘comma separated value’ files (‘.csv’)

  5. Characteristics of Dataset > names(auto): returns the attribute names of “auto” dataset > str(auto): returns the attribute names of “auto” dataset and a short description of each attribute and the dataset

  6. Basic Attribute Types • Numeric • Real numbers • Integer • Logical • Binary: true or false • Character/Strings • e.g. “red”, “green” • Factor • Categorical attribute whose values are stored as a vector of integers in the range [ 1... k ] (where k is the number of unique values in the nominal variable) • e.g. In attribute country: 1 - USA, 2 – European, 3 – Japan • An ordered factor is used to represent an ordinal variable • e.g. In attribute size: 1 - small, 2 – medium, 3 – large

  7. Convert Attribute Type • as. numeric(x) • Convert an attribute to numeric • as. integer(x) • Convert an attribute to integer • as. factor(x) • Convert an attribute to factor • toString(x, width=…) • Convert an attribute to characters/strings

  8. R Objects • Scalar • A single value • Vector • A one-dimensional array of arbitrary length > c(2, 3, 5, 2, 7, 1) > 3:10 > c(“Canberra”, “Sydney”, “Newcastle”) • All elements of the vector must be of the same type (e.g. numerical, character, etc.) • Subsets of the vector may be referenced > x <- c(2, 3, 5, 2, 7, 1) > x[c(2,4)] #extract elements 2 and 4 of x > x[-c(2,4)] #extract elements of x except elements 2 and 4

  9. R Objects (Cont.) True: fill by rows False: fill by columns number of columns number of rows The first two elements at the 1st row The elements at the first two columns • Matrix • A two-dimensional array with an arbitrary number of rows and columns • All elements of the matrix must be of the same type • Subsets of the matrix may be referenced • Individual rows and columns of the matrix may be handled as vectors

  10. R Objects (Cont.) • Array • As a matrix, but of arbitrary dimension • Data Frame • A dataset with rows (representing data records) and columns (representing attributes) • May be handled similarly to a matrix • Individual columns of the data frame may be handled as vectors

  11. R Objects (Cont.) • Quit Function: q() • On quitting, R offers the options of saving the workspace image, in the file .RData in the working directory • Remove Object Function: rm() • Remove objects that are no longer needed • Function • R has a vast number of `built-in' functions • e.g. mean( ), plot( ), var( ), etc. • Users can write their own functions • List • An arbitrary collection of other R objects (which may include other lists)

  12. A Simple Scatterplot plot (auto$mpg, auto$horsepower) produces a scatterplot of mpg vs. horsepower of the auto dataset */ text(40, 200, “Plot of mpg vs. horsepower”) adds the label at the location (40, 200) within the plot

  13. Overview of R Graphics • Graphics Functions • High-level functions that produce complete plots • Some flexibility in the way that the data to be plot can be specified • e.g. plot( ) • Low-level functions that add some outputs to existing plots • e.g. text( ) • Functions for working interactively with graphical outputs • “Painters Model” • Graphics output occurs in steps, later output obscuring any previous output that it overlaps

  14. Traditional Standard Plots

  15. Trellis Plots Provided through package Lattice Embody a number of design principles proposed by Bill Cleveland (1987, 2004) that aim to ensure effective visualization Trellis Display

  16. When there are many overlapping points, we can make points semi-transparent to mitigate the overlapping issue Where the color is "#RRGGBBAA" and the AA portion is the opacity/trasparency

  17. Special-Purpose Plots • R provides a set of functions for producing graphical output primitives (e.g. lines, text, rectangles, polygons, etc.) which users can use to create plots with special purposes

  18. Graphical Output Formats # A PDF file of the plot will be saved in the same directory as that of the R workspace • When using R interactively, the result is a plot drawn on screen • Can be saved as a PDF, postcript, or image file File > Save as > Postcript…/PDF…/Png… (a desired format) • Can produce a file that contains the plot • Output is directed to a particular output device which indicates the output format postscript( )for Adobe PostScript file, pdf( ) for Adobe PDF file, pictex( ) for LaTex PicTex file, png ( ) for PNG bitmap file, jpeg( ) for JPEG bitmap file, bmp( ) for Window BMP file • Close a device dev.off ( )

  19. Structure of the R Graphics System • Core Graphics Systems • Graphics (traditional graphics) • Grid • Lattice package is built on Grid • Graphics Engine & Devices • grDevices package consists of functions that provide support for handling colors and font Structure of the R Graphics System (Showing the main packages that provide graphics functions in R. Arrows indicate where one package builds on the functions in another package)

  20. Traditional versus Grid Graphics Systems • High-Level Functions • The traditional system, or the graphics package built on the top of it, provide the majority of the high-level functions currently available in R • Lattice package, built on the Grid system, provides high-level functions • Low-Level Functions • Both provide many low-level functions • Functions for Interaction • Traditional system provides very limited interaction • Grid system provides functions for interacting with graphical outputs • Editing, extracting, deleting parts of an image • Graphics Design • Trellis plots have a better design in terms of visually encoding information (based on research on human visual perception)

  21. Lattice Graphics Model • Lattice Plot Types • A number of standard plot types (like those in the traditional graphics) • More modern and specialized plots • A table of comparison of plot functions of lattice and traditional graphics systems can be downloaded from the course website • A Lattice graphics function produces an object of class “trellis” which contains description of the plot The following two sets of functions produce the same plot (1) (2) • Possible to work with the trellis object and modify it using the update() function for trellis objects Loading Lattice into R

  22. Trellis Display: xyplot • xyplot(y~x|g1,g2,…, data, …) produces a scatterplot of y (on vertical axis) versus x (on horizontal axis) conditioning on g1, g2, … • Create shingles for conditioning variables with continuous values • A shingle is a data structure that consists of a numeric vector along with some possibly overlapping intervals • equal.count(x, number, overlap) • Create a shingle that consists of intervals with (almost) the same number of data records • x: the variable to be shingled; number: the number of intervals; overlap: the overlapping between successive intervals (as proportion to the number of records in each interval)

  23. Trellis Display: 3D Scatterplot cloud(z~x*y|g1,g2,…, data, …) produces a 3D scatterplot of z (on vertical axis) versus x and y (on horizontal grid) conditioning on g1, g2, …

  24. Parallel Coordinates Parallel(x, data, …) produces a parallel coordinates of data frame x

  25. Rotate Plot

  26. Parallel(x|g1,g2,…, data, …) produces a parallel coordinates of data frame x conditioning on g1, g2, …

  27. R Formula • The first argument to the lattice plotting functions is usually an R formula • Common Types • y~x: plots variable y (on the vertical axis) against variable x (on the horizontal axis) • ~x: used in plots of one variable x or parallel coordinates of a data frame (matrix) x • z~y*x: plots variable z against x and y (which are on the base grid) • y1+y2~x: plots both variable y1 and variable y2 against x

  28. Arranging Lattice Plots  "The aspect ratio is vital because it has a large impact on our ability to judge rate of change. A number of studies in visual perception have shown that our ability to judge the relative slopes of line segments on a graph is maximized when the absolute values of the orientations of the segments are centered on 45 degrees.“ Bill Cleveland (http://stat.bell-labs.com/project/trellis/interview.html ) • Arrangement of Panels and Strips in a Single Lattice Plot • layout(mat, …) mat: a matrix object with up to 3 dimensions, specifying the number of the columns, rows, and pages • aspect argument: specifies the aspect ratio (height divided by width) for the panels • aspect=“fill” by default which means to make the panel to fill the available space • aspect = “xy” means the aspect ratio is calculated to satisfy the “banking to 45°”

  29. Arrangement of Several Lattice Plots on a Single Page • First, create a trellis object for each lattice plot • Then, call print( ), supplying arguments to specify the position of each plot

  30. Traditional: Plots of One or Two Variables plot( ) produces scatterplots

  31. Traditional: Plots of One or Two Variables (Cont.) Specify data to be plot in plot( )

  32. Traditional: 3D Plots persp(x, y, z, …) produces 3D surfaces with x and y as the base coordinates and z is a function of x and y

  33. Traditional: 3D Plots (Cont.) symbols(x, y, circles, squares, rectangles, stars, thermometers, boxplots, …) uses one of the six symbols to represent the third variable

  34. Traditional: Multivariate Plots pairs(x, …) produces a scatterplot matrix of x (a matrix or data frame)

  35. Traditional: Multivariate Plots (Cont.) stars(x, …) produces a star plot of x

  36. Getting Help Every R function and dataset has online help associated with it, using help( ) help(help) gives instructions on how to use help( )

More Related