400 likes | 559 Views
Creating a Compact Columnar Output with PROC REPORT. Walter R. Young Principal Clinical Programmer Analyst Wyeth. Why create a columnar output from a data set. For own use. To show others at a meeting. To put in a standardized regulatory report. To put in a publication.
E N D
Creating a Compact Columnar Output with PROC REPORT Walter R. Young Principal Clinical Programmer Analyst Wyeth
Why create a columnar output from a data set. • For own use. • To show others at a meeting. • To put in a standardized regulatory report. • To put in a publication. The above are ranked in approximately increasing effort required to create the report. Author’sOpinion: It is the neatest, most heuristic way to present a data set.
Solution1: Default PROC PRINT Advantages: • Easy and Neat for Narrow Data Set • Can Use ID and VAR Statements. Disadvantages: • Virtually No Beautification Options • Doesn’t Work for a Wide Data Set • Wraps Neatly but One Can’t Control Wrapping with the exception of increasing the page size. • Wide Columns are Truncated.
Solution2: Use a Default PROC REPORT with NOWD (Required in Batch) Option Advantages • Output is columnar. • Small number of variables fit on a page. Disadvantages • Rows are not identified for multiple pages. • Spacing between columns is uneven. • Column labels split unattractively. • Column order isn’t optimum. • Wide columns cause program to bomb.
PROC REPORT Defaults and Properties • PROC was designed to run interactively. • Labels are used as column headers. • Variables are in position order. • Spacing is 2 including before the first column. Actual spacing rules are explainable but messy. • WRAP with NAMED options create messy output. • If all variables are numeric and none are specified as DISPLAY, they are summed instead of listed. • MISSING option is needed to print all data rows. • If a variable name not in the data set, is listed in both the COLUMN and DEFINE statements, no error will result.
PROC REPORT Justification Rules • Default is right for numeric - left for character. • Numerical values are right justified within formats which are justified within the width. • Character values are justified in width with leading blanks retained and trailing blanks eliminated. • Rules apply simultaneously to labels and values.
PROC REPORT SPLIT Character • The SPLIT character splits both labels and FLOW variables. • Choose a printable character(e.g., “~”, “|”, “\”) other than the default “/” which is common in many entered texts. • If a word’s length in a flowed variable is greater than the variable’s width, the word will split at that width. • To indent flowed text, insert a split character plus the desired number of spaces and one at the end of the text. • If there is a split character in the flowed variable, words at the end of the field will split randomly due to a SAS bug which will be fixed in a future SAS version. To fix this one either widens the field to eliminate non-indented flow or writes a macro to insert split characters where desired. • If unprintable printer control characters exist in the flowed variable, they must be removed. This is an uncommon problem which can happen if data is coming from many sources.
Solution3: Use ID Statement (introduced in 6.12) With PROC REPORT • Observations are identified. • However, All Other Default Problems Exist. • Won’t Work if the Width of any Variable • Exceeds the Inherent PROC REPORT limit. • Plus the Width of the ID variables Plus the Spaces Between Columns Exceeds Line Size. (In this case the FLOW option must be used.)
Solution4: PROC REPORT with Minimal Options for a Narrow Data Set • Must Use a COLUMN Statement (Analogous to a VAR statement in PROC PRINT). • Use a BREAK Statement for spaces between lines. • Use HEADLINE, HEADSKIP, BREAK Statement or underline to separate labels from the observations. • Specify a Constant Spacing Between Columns. • Customize Labels in the DEFINE statements or Use Variable Names (System NOLABEL option). • Possibly use PANEL option to minimize paper use. • The above gives you most of the features of using a PUT statement formatting (DATA _NULL_).
Solution5: Use PROC REPORT With All Applicable Options for a Wide Data Set If the width of a data set won’t fit within the line size, one should make the output compact.In a compact output, the maximum number of observations of the variables should be made to either fit on the width of a single page or on the width of a minimum number of pages.Author’sOpinion: Presenting the data in columns on a single page width, neatly and informatively, is more heuristic than presenting it on multiple pages.
Tomakethereportcompact • Make every reasonable effort to limit width to a single page. • Reduce the space between columns to one. • Drop space before 1st column (SPACING=0 in DEFINE). • Drop unnecessary variables from COLUMN statement. • Drop variables having the same value for all observations and consider putting them in a title, footnote or legend. • Sort the data by sensible variables having a fair number of rows for each combination in the BY statement and use the BY in PROC REPORT. • For data sets wider than a single page, pick the minimum # of ID variables to adequately identify all observations. Balance the width of the non-ID variables across pages. • Use PROC FREQ to determine whether long variables can be coded and describe the code in a legend.
Tomakethereportcompact • Don’t use the FLOW option unless necessary as it increases the number of lines per observation. Also, consider not using the SPLIT character in its label. • Don’t alter any variable if proofreading. • Eliminate variables which have a one to one relationship with other variables. • Sensibly condense character variables. • Edit variables without eliminating their meaning. • Transfer meaning from a variable to its label. • Since formats can alter variable widths, apply them prior to calculating column widths. • Use the STYLE attribute, some of the 6 font parameters and ODS. While good for publications, this doesn’t support a standardized line size and appearance.
Foralphanumericvariables • Determine their maximum width in the data set. • If a format increases this width, use that width. • Consider removing any invariant prefixes or suffixes • If the FLOW parameter is required, consider the line size constraint, calculate the width plus spacing of all other variables and: • For a single FLOW variable, use its maximum width. • For multiple FLOW variables, determine how to best allocate their widths to minimize lines per observation. • See if other data can be put on the added line(s) per observation (e.g., concatenate visit date, SPLIT character and visit name and use the FLOW option).
Fornumericvariables • Determine their range, maximum value and whether they’re integer and then specify an appropriate format (not the default BEST) and decimal point. • For date time variables, specify an appropriate compact format (e.g., MMDDYY6.). Separate date and time with DATEPART. If time is missing for all observations remove it from the report. • If it has a format which transforms it into an alphanumeric variable, apply the format and treat it as though it were an alphanumeric variable.
Toincreasetheattractivenessofacompactreport • Appropriately order the COLUMN statement variables. • Appropriately specify ORDER variables and the BREAK statement (e.g., blank line between ID variables). • Use informative labels neatly spanned in the COLUMN statement and appropriately split in the DEFINE. • Use tricks (e.g., unprintable character at end of label, SPLIT character and blank at beginning of label, spacing =0) to separately justify labels and values. • Use neat and informative titles, footnotes and/or legends. If necessary, expand a label’s meaning in a legend.
General ECGTEST Compact Report • Decide what variables to always exclude. • Decide constant variables for title. • Decide what 1 to 1 variables to include. • Count and remove applicable leading zeroes. • Determine which data condensing tricks work. • If not condensable, they must be output in full • Predetermine variables that need FLOW. • Determine which variables have a fixed width. • Calculate width of all remaining variables. • Use BY variables and the ID option for date.
Changes Made to Produce Final Listing • Width of all variables was minimized. • Leading zeroes were stripped from subject; it was concatenated with age sex and FLOW added. • Century was eliminated and date was output after subject and made an ORDER variable. • Body system was coded into footnotes. • Verbatim label indentation was corrected. • Labels were beautified. “STUDY DAY” was centered. • Width of indented column was maximized to eliminate FLOW of the concatenated variables. • DAI was put in data set and FLOW added. The above reduced the output from 21 to 11 pages.
Creating an Automated AE Listing • Find variables needed for other projects and: • Find if their width is variable and calculate it. • Make attractive labels with SPLIT characters. • Exclude them if they are blank (e.g., time). • Use minimum possible width. • 2 lines per observation: Thus use FLOW? • Maximize width of the verbatim variable. • User should specify variables and their order. • Change footnotes to an automated legend. • Add options for the BY variables.