790 likes | 1.14k Views
Getting Started with the SGPLOT Procedure: A Hands-On Workshop About the Presenter.
E N D
Getting Started with the SGPLOT Procedure: A Hands-On WorkshopAbout the Presenter Josh Horstman is an independent statistical programming consultant and trainer based in Indianapolis with 20 years’ experience using SAS in the life sciences industry. He specializes in analyzing clinical trial data, and his clients have included major pharmaceutical corporations, biotech companies, and research organizations. Josh is a SAS Certified Advanced Programmer who loves coding as well as talking about coding at SAS Global Forum and other SAS User Group meetings. Getting Started with the SGPLOT Procedure
Getting Started with the SGPLOT Procedure Getting Started with the SGPLOT Procedure WUSS 2018 - Hands-On WorkshopJosh Horstman
Getting Started with the SGPLOT Procedure INTRODUCTION TO SGPLOT
Overview: The Output Delivery System (ODS) • Prior to ODS, SAS limited to text-based “SAS listing” output • ODS output makes use of colors, fonts, graphics, and more! • ODS provides ability to produce output in various formats: … and more! • ODS is part of the Base SAS product since version 7(No separate license required) Getting Started with the SGPLOT Procedure
Overview: ODS Statistical Graphics • An extension to ODS used to create analytical graphs • Introduced in SAS 9.2 as part of SAS/GRAPH (experimental in v9.1) • Moved into the Base SAS product in version 9.3 • Based on the Graph Template Language (GTL) Getting Started with the SGPLOT Procedure
ODS Statistical Graphics – Components • Graph Template Language (GTL) – comprehensive language for creating statistical graphics • ODS Graphics procedures – provide a procedural interface to most common features of GTL • ODS GRAPHICS statement – controls various graphic-related settings and options • ODS Graphics Editor – interactive tool for modifying graphs • ODS Graphics Designer – graphical interface for designing graphs Getting Started with the SGPLOT Procedure
ODS Statistical Graphics – Procedures • SGPLOT – single-cell plots • SGPANEL – multiple-panel plots • SGSCATTER – advanced scatter plots • SGRENDER – render graphs written in GTL • SGDESIGN – used with ODS Graphics Designer Getting Started with the SGPLOT Procedure
Statistical Graphics vs. Legacy SAS/GRAPH SG Procedures SAS/GRAPH GPLOT, GCHART, GSLIDE, GBARLINE, GCONTOUR, etc. Based on device drivers Creates catalog entries Use GOPTIONS statement to control environment Many properties set with global statements such as AXIS, LEGEND, SYMBOL, etc. • SGPLOT, SGPANEL, SGSCATTER, etc. • Based on templates • Creates image files • Use ODS GRAPHICS statement to control environment • Visual properties are set within the procedure Getting Started with the SGPLOT Procedure
About ODS Destinations • To create ODS graphs, a valid ODS destination must be open. • Build an “ODS sandwich” around your graph code. • For example, to output a graph to the PDF destination: odspdffile="c:\example.pdf"; <SG procedure code goes here...>; odspdfclose; • Similar syntax for ODS HTML, ODS RTF, etc. Getting Started with the SGPLOT Procedure
Example Datasets • Datasets in SASHELP library included with SAS – you already have them! • SASHELP.CLASS (Demographics on 19 students) • SASHELP.CARS (Data about 428 car models) Getting Started with the SGPLOT Procedure
More Example Datasets • SASHELP.HEART (5,209 patients from a heart study) • SASHELP.STOCKS (Stock prices of IBM, Intel, & Microsoft) Getting Started with the SGPLOT Procedure
Basic SGPLOT Syntax procsgplotdata=<input-data-set> <options>; <one or more plot requests> <other optional statements> run; There are dozens of plot request statements available – SCATTER, SERIES, VBOX, VBAR, HIGHLOW, BUBBLE, etc. Getting Started with the SGPLOT Procedure Other optional statements control specific graph features – XAXIS, YAXIS, REFLINE, INSET, KEYLEGEND, etc.
EXERCISES 1-11 Getting Started with the SGPLOT Procedure SIMPLE PLOTS
The SCATTER Statement • Creates a scatter plot. proc sgplot data=<input-data-set> <options>; scatterx=variabley=variable < / options>; run; Getting Started with the SGPLOT Procedure Include a slash before specifying one or more options. X and Y are required arguments that specify the variables to plot.
Exercise #1: Basic Scatter Plot • Goal: Create a scatter plot of WEIGHT vs HEIGHT • Input: SASHELP.CLASS • Syntax: • SCATTER statement • X= argument • Y= argument Getting Started with the SGPLOT Procedure
Exercise #1: Basic Scatter Plot procsgplotdata=sashelp.class; scatterx=height y=weight; run; Getting Started with the SGPLOT Procedure
The GROUP= Option proc sgplot data=<input-data-set> <options>; scatterx=variabley=variable / group=variable <more options>; run; • Plot elements for each group value are automatically distinguished by different visual attributes. • GROUP= option available on almost every plot type. Getting Started with the SGPLOT Procedure Specifies a variable used to group the data.
Exercise #2: Grouped Scatter Plot • Goal: Create a scatter plot of WEIGHT vs HEIGHT, grouped by SEX • Input: SASHELP.CLASS • Syntax: • SCATTER statement • X= argument • Y= argument • GROUP= option Getting Started with the SGPLOT Procedure
Exercise #2: Grouped Scatter Plot Specifies a grouping variable procsgplotdata=sashelp.class; scatterx=height y=weight / group=sex; run; Legend automatically generated Getting Started with the SGPLOT Procedure Alternative: Use BY statement to get separate graphs for each value.
Exercise #2: Grouped Scatter Plot - BONUS procsgplotdata=sashelp.class; scatterx=height y=weight / group=sex datalabel=name; run; Getting Started with the SGPLOT Procedure Specifies a variable used to label each data point.
The BUBBLE Statement • Creates a bubble plot. proc sgplot data=<input-data-set> <options>; bubblex=variabley=variablesize=variable < / options>; run; Getting Started with the SGPLOT Procedure X and Y are required arguments that specify the variables to plot. SIZE is a required argument that specifies a variable that controls the size of the bubbles.
Exercise #3: Grouped Bubble Plot • Goal: Create a bubble plot of WEIGHT vs. HEIGHT, grouped by SEX with bubbles sized by AGE. • Input: SASHELP.CLASS • Syntax: • BUBBLE statement • X= argument • Y= argument • SIZE= argument • GROUP= option Getting Started with the SGPLOT Procedure
Exercise #3: Grouped Bubble Plot procsgplotdata=sashelp.class; bubblex=height y=weight size=age / group=sex; run; Getting Started with the SGPLOT Procedure
The SERIES Statement • Creates a line plot. proc sgplot data=<input-data-set> <options>; seriesx=variabley=variable < / options>; run; • By default, only lines are shown, not the points themselves. • To add markers to points, use MARKERS option. Getting Started with the SGPLOT Procedure X and Y are required arguments that specify the variables to plot.
Exercise #4: Grouped Series Plot • Goal: Create a series plot of closing price (CLOSE) by date (DATE) grouped by company (STOCK). Add a title to your plot. • Input: SASHELP.STOCKS • Syntax: • SERIES statement • X= argument • Y= argument • GROUP= option • TITLE statement Getting Started with the SGPLOT Procedure
Exercise #4: Grouped Series Plot procsgplotdata=sashelp.stocks; title"Stock Prices 1986-2005"; seriesx=date y=close / group=stock; run; Getting Started with the SGPLOT Procedure
The HIGHLOW Statement • Creates floating vertical or horizontal lines representing high and low values. proc sgplot data=<input-data-set> <options>; highlowx=variable |y=variable high=variable low=variable < / options>; run; • Add CLOSE= option to specify variable for a closing tick mark. Getting Started with the SGPLOT Procedure Use both HIGH AND LOW to specify upper and lower values for the floating lines. Use either X OR Y to specify values to plot along X or Y axis.
Exercise #5: High-Low Plot • Goal: Create a high-low plot of monthly stock prices with closing ticks for the stock IBM during the year 2005. • Input: SASHELP.STOCKS • Syntax: • HIGHLOW statement • X= argument • HIGH= argument • LOW= argument • CLOSE= option • WHERE statement Getting Started with the SGPLOT Procedure
Exercise #5: High-Low Plot HIGH and LOW specify endpoints of each bar. procsgplotdata=sashelp.stocks; where stock='IBM' and date >= "01jan2005"d; highlowx=date high=high low=low / close=close; run; Use X= for vertical bars or Y= for horizontal, but not both! Getting Started with the SGPLOT Procedure CLOSE variable determines locations of closing ticks
The HBOX Statement • Creates a horizontal box plot. proc sgplot data=<input-data-set> <options>; hboxvariable < / options>; run; • Use CATEGORY= option to create a box for each distinct value of a category variable. (Can be combined with GROUPing.) • VBOX statement is analogous for vertical box plots. Getting Started with the SGPLOT Procedure Analysis variable must be numeric!
Anatomy of a Box Plot Distance between Q1 and Q3 is the Inter-Quartile Range (IQR) Q3 Mean Q1 Outlier Getting Started with the SGPLOT Procedure MinimumValue Above Lower Fence MaximumValue Beneath Upper Fence Median Values outside fence are considered outliers. Lower Fence = Q1 – 1.5*IQRUpper Fence = Q3 + 1.5*IQR
Exercise #6: Horizontal Box Plot • Goal: Create a horizontal box plot of vehicle price (MSRP) by vehicle type (TYPE). • Input: SASHELP.CARS • Syntax: • HBOX statement • Numeric analysis variable • CATEGORY= option Getting Started with the SGPLOT Procedure
Exercise #6: Horizontal Box Plot procsgplotdata=sashelp.cars; title"Price by Car Type"; hboxmsrp / category=type; run; Getting Started with the SGPLOT Procedure
The VBAR Statement • Creates a vertical bar chart. proc sgplot data=<input-data-set> <options>; vbar categorical-variable < / options>; run; • RESPONSE= option specifies response variable to control length of bars. (Otherwise, bars represent frequency counts.) • STAT= option specifies statistic for length of bars (Default is SUM when RESPONSE variable is included, FREQ otherwise.) • HBAR statement is analogous for horizontal bar charts. Getting Started with the SGPLOT Procedure
Exercise #7: Vertical Bar Chart • Goal: Create a vertical bar chart of mean engine size (ENGINESIZE) by vehicle origin (ORIGIN). • Input: SASHELP.CARS • Syntax: • VBAR statement • Categorical variable • RESPONSE= option • STAT= option Getting Started with the SGPLOT Procedure
Exercise #7: Vertical Bar Chart procsgplotdata=sashelp.cars; title"Mean Engine Size by Origin"; vbar origin / response=enginesizestat=mean; run; Getting Started with the SGPLOT Procedure
Exercise #7: Vertical Bar Chart - BONUS procsgplotdata=sashelp.cars; title"Mean Engine Size by Origin"; vbar origin / response=enginesizestat=mean limits=both; run; Getting Started with the SGPLOT Procedure LIMITS= option adds upper limits, lower limits, or both. LIMITSTAT= option specifies statistics (default is confidence limits).
The GROUP= Option proc sgplot data=<input-data-set> <options>; vbar categorical-variable / group=variable <more options>; run; • GROUP= option will create a bar for each distinct value of a grouping variable, within each category. • Use GROUPDISPLAY= to specify how bars are grouped (CLUSTER or STACK) Getting Started with the SGPLOT Procedure
Exercise #8: Grouped Vertical Bar Chart • Goal: Create a vertical bar chart of mean engine size (ENGINESIZE) by vehicle type (TYPE) and grouped into clusters by vehicle origin (ORIGIN). • Input: SASHELP.CARS • Syntax: • VBAR statement • Categorical variable • RESPONSE= option • STAT= option • GROUP= option • GROUPDISPLAY= option Getting Started with the SGPLOT Procedure
Exercise #8: Grouped Vertical Bar Chart procsgplotdata=sashelp.cars; title"Mean Engine Size by Type and Origin"; vbar type / response=enginesizestat=mean group=origin groupdisplay=cluster; run; Getting Started with the SGPLOT Procedure
Exercise #8: Grouped Vertical Bar Chart - BONUS procsgplotdata=sashelp.cars; title"Mean Engine Size by Type and Origin"; vbar type / response=enginesizestat=mean group=origin groupdisplay=stack; run; Getting Started with the SGPLOT Procedure To stack the bars, use the GROUPDISPLAY= option with a value of STACK instead of CLUSTER.
The HEATMAP Statement • Color-codes rectangles based on two-dimensional binning of data. proc sgplot data=<input-data-set> <options>; heatmapx=variabley=variable < / options>; run; • Options are available to control the size and/or number of bins in each dimension as well as the colors used. Getting Started with the SGPLOT Procedure X and Y are required arguments that specify the variables to plot.
Exercise #9: Heat Map • Goal: Create a heat map of CHOLESTEROL vs. WEIGHT. • Input: SASHELP.HEART • Syntax: • HEATMAP statement • X= option • Y= option Getting Started with the SGPLOT Procedure
Exercise #9: Heat Map Bins are colored according to frequency count. procsgplotdata=sashelp.heart; heatmap x=weight y=cholesterol; run; Getting Started with the SGPLOT Procedure Data are grouped into bins in two dimensions.
Exercise #9: Heat Map - BONUS procsgplotdata=sashelp.heart; heatmap x=weight y=cholesterol / nxbins=50 nybins=50; run; Getting Started with the SGPLOT Procedure Specify 50 bins in the X dimension and 50 bins in the Y dimension (2500 total rectangles).
The VLINE Statement • Creates a vertical line chart (line is horizontal). proc sgplot data=<input-data-set> <options>; vline categorical-variable < / options>; run; • VLINE plots statistics, SERIES plots raw data points • RESPONSE= and STAT= options are similar to VBAR • HLINE statement is analogous for horizontal line charts. Getting Started with the SGPLOT Procedure
Exercise #10: Grouped Vertical Line Chart • Goal: Create a vertical line chart of mean HEIGHT by AGE grouped by SEX and include plot markers. • Input: SASHELP.CLASS • Syntax: • VLINE statement • Categorical variables • RESPONSE= option • STAT= option • GROUP= option • MARKERS options Getting Started with the SGPLOT Procedure
Exercise #10: Grouped Vertical Line Chart procsgplotdata=sashelp.class; title"Height by Age and Sex"; vline age / response=height stat=mean markersgroup=sex; run; Getting Started with the SGPLOT Procedure
Exercise #10: Grouped Vertical Line Chart - BONUS procsgplotdata=sashelp.class; title"Height by Age and Sex"; vline age / response=height stat=mean markers group=sex limits=both; run; Getting Started with the SGPLOT Procedure Adds confidence limits
The REG Statement • Fits a regression line or curve. proc sgplot data=<input-data-set> <options>; regx=variabley=variable < / options>; run; • Includes both plot markers and line by default. • Remove markers with NOMARKERS option. Getting Started with the SGPLOT Procedure X and Y are required arguments that specify the variables to plot.