450 likes | 471 Views
This paper presents design guidelines for creating reliable and effective macro tools in SAS, including macro usage, design choices, and documentation.
E N D
Building the Better Macro:Best Practices for the Design of Reliable, Effective Tools Frank DiIorio CodeCrafters, Inc. Philadelphia, PA
Premise and Scope • Macro language is a powerful adjunct to the SAS System • Its flexibility is an asset • Its flexibility is a liability • Some self-imposed structure is needed • To do the "right thing" • To simplify coding and debugging This paper presents a collection of design guidelines. They [1] are notexhaustive [2] are"arguable" (e.g., "What?! No mention of function-style macros??!!" Well … no)
Reasons to Care (1 of 2) Your macro: %do i = 1 %to &nDatasets.; %let dset = %scan(&dsList., &i.); %dsetN(data=&dset., count=nobs) %if &nobs. > 0 %then %do; … ODS, PROC REPORT statements … %end; %end; Seems harmless, but it loops, always displaying the first dataset in the list. We trace the problem to %dsetN, which set &I = 1 Easily fixed (make scope of I in dsetN = local), but irritating that we have to fix it
Reasons to Care (2 of 2) Your program: %missVars(data=clin.ae, out=ae_miss, keep=usubjid) proc print data=ae_miss; title "Variables in CLIN.AE that had all missing values"; run; %missVars is supposed to create dataset AE_MISS, which contains a list of variable in CLIN.AE that had no non-missing values. Let's see what happened …
Reasons to Care (2 of 2) Before execution: Global macro variables DSLIST and DSLIST_N WORK datasets REPORT, ATTRIBS After execution: Global macro variables DSLIST, DSLIST_N, and NOBS WORK datasets REPORT, ATTRIBS, AE_MISS and _TEMP_ The macro ran correctly, but left temporary datasets and macro variables in the SAS session. It isn't the end of the world, but it is sloppy coding.
Reasons to Care, In General Any set of Best Practices provides insight into a macro’s design, usage, and implementation. Best Practices will not result in perfection, but will bring you closer to a more modest and achievable design goal: “good design [does not intend] to achieve perfection but to minimize imperfection and render it acceptable, unimportant, negligible, unnoticeable.” Henry Petroski “Small Things Considered”
1. Know When a Macro Is and Is Not Necessary Macros are helpful and often essential. Even well-designed and well-documented macros can be complex and difficult to maintain. Don't assume the macro language is the only solution. It is just one tool (albeit a real big one) in the SAS programmer's toolbox. Other tools: • CALL EXECUTE • SCL • DATA step using hashing or arrays • SQL, to programmatically generate code • Macro variables, functions used in open code
2. Conceptually Separate Utilities, Applications Differences are: scope / granularity; type and amount of error-trapping; development mindset. Size – # of lines, # of statements – is irrelevant Utilities: • Narrow focus • Name completely describes functionality (countObs, quoteList, etc.) • Imperative to "play well with others" Applications: • Broad scope • Highly parameterized • Make liberal use of utilities to reduce code volume, improve reliability • If standalone/batch program, can have less rigorous cleanup
3. Clearly Document the Macro Two audiences: Users Programmer User documentation can contain: Description Inputs Outputs Processing Execution Limitations on use Examples
3. Clearly Document the Macro (cont.) Sample documentation block: /* Name: ISOcharToNum Description: Converts a character string containing an ISO8601 extended date or dateTime variable to a numeric variable Type: ISO Inputs: Parameters (not case-sensitive unless otherwise noted) VAR Name of the character variable holding the ISO 8601 variable. Assume length is 19. REQUIRED. No default. OUT Name of the output numeric variable. See "Notes" section, below, for details. REQUIRED. No default. Outputs: Variable specified by OUT parameter. Execution: Execute in DATA step Run using SAS V 9.1.3 or later Notes: If length is 16, pad seconds position (17-19) with :00 Create OUT only if length of input variable is 10 (indicates complete date) or 19 (indicates complete date-time) Assigns format DATETIME16 to variable specified by the OUT parameter. Creates, then drops, variables beginning with _c2ISO_ See Also: ISO, ISOdur History: Programmer Date Brief Description FCD 2009-05-26 Initial program */
3. Clearly Document the Macro (cont.) Programmer documentation can contain comments that describe or identify: • Start of major processing blocks / checkpoints • Discussions of code that was problematic during development • Revision codes (addressed at length later) %do i = 1 %to &dsetN.; %let dataset = %scan(&datasetList., &i.); /* Use ANYMISS to identify obs with all missing values */ %anyMiss(data=&dataset., missing=allMissing) %if &allMissing. ^= %then %do; /* [U03] Add test for ALLMISSING */ … write to PDF … /* Important! PDF hyperlink workaround pertrack # 5607646 (2007/11/10) */ %end; %end;
4. Use Keyword Parameters Consistently You always have a choice, but which would you rather code? %xpt(ae cm dm, 100, , delete) %xpt(data=ae cm dm, delete=yes, split=100) Keyword parameters: • Give clues to content • Relieve the need for consistent order Use consisent naming and values. • %createSpec(datasets=ae cm,output=aecmspec.pdf, sortBy=pos, msg=yes %xpt(data=ae cm, msg=t, rpt=aeCMxpt) • %createSpec(data=ae cm, output=aecmspec, sortBy=pos, msg=yes) %xpt(data=ae cm, msg=yes, rpt=aeCMxpt)
5. Use Consistent Program Structure Programmers benefit from consistent structure: locations of similar tasks (error-trapping, program cleanup, etc.) become familiar. This helps development, debugging, enhancement. Generalized sections (importance / need vary with program complexity and size): • Header documentation (described earlier, in Section 3) • Initialization • "Core" processing • Termination
5. Use Consistent Program Structure (cont.) Initialization section can include: Verify correct execution environment Check parameters and resources (datasets, etc.) Capture option values that will be re-set Define local and global macro variables Coding technique for capturing multiple errors: %local bad; %if <condition1> %then %do: %let bad = t; %put <message1>; %end; %if <conditionN> %then %do: %let bad = t; %put <messageN>; %end; %if &bad. = t %then %do; %putTerminating for reason(s) noted above; %goto <terminationSection>; %end;
5. Use Consistent Program Structure (cont.) "Core processing" section: • Non-housekeeping / initialization • Code containing the stated purpose of the macro • Either branches to termination section if an error condition arises or drops into the termination section as part of successful execution • Can / should make liberal use of utility macros
5. Use Consistent Program Structure (cont.) Termination section is vital! It ensures that no unexpected and unwanted "artifacts" remain once the macro terminates (recall "Reasons to Care #2) • Should always be executed (so … no %return; or similar statements) • Use %symdel, PROC DELETE, other tools to re-set or remove anything created by the macro that wasn't identified in the header comment's "outputs" section. %let opts = %sysfunc(getoption(center)) %sysfunc(getoption(date)) ; options nocenter nodate; %bottom: options &opts.; %if %symExist(t1) %then %symdel t1; %if %sysFunc(exist(part1)) %then %do; proc delete data=part1; run; %end;
6.Build Means for User, Program Communication The macro may be a "black box," but it shouldn't be mysterious – communication with the user and other programs is essential. Communicating with the User Display: check points, parameters, anything that communicates what's happening during processing. Remember the programmer is a user as well: diagnostics can include debugging-level information (name-value pairs for macro variables, list of datasets present when the program fails, etc.)
6.Build Means for User, Program Communication Communicating with the User You can toggle non-essential (i.e., non-error/warning) messages as follows: %local prt; %if %upcase(&msg.) = NO %then %let prt = *; %&prt.put test-> Dataset contains &dsnCount. obs.; data _null_; set rpt; &prt.put dsn= vname= status=; run; Run the macro with MSG=no &prt = * SAS executes: %*put test-> Dataset contains &dsnCount. obs.; Run the macro with MSG^=no &prt = SAS executes: %put test-> Dataset contains &dsnCount. obs.;
6.Build Means for User, Program Communication Communicating with the User Some messages may be unconditional. Other can be timed: %if %sysfunc(juldate("&sysdate9."d)) < %sysfunc(juldate("01apr2012"d)) %then %do; %put; %put Version 4.00 effective March 1, 2012: %put; %put Changes from Version 3.x:; %put list of changes goes here %put See documentation location for details; %put; %end;
6.Build Means for User, Program Communication Communicating with the User One of my favorites … take the time to make messages informative. Would you rather see this in your Log: 2 dsns AE 00021 aeondt DM or this? After filtering, process 2 eligible data sets: 1 of 2: AE USUBJID 00021 had missing data for required variable AEONDT 2 of 2: DM Messages to the Log "tell the story" of the macro's execution. Take the time to make it a readable story.
6.Build Means for User, Program Communication Communicating with other Programs Give the utility macro the chance to communicate not just what it did, but how well it was done – i.e., return codes. One implementation method: global macro variable = 0 if successful, otherwise indicates warning / error conditions. The values are described in the program header: Return Codes: 0 = Data set located, observations counted successfully 1 = Data set located, but could not count observations 2 = Data set could not be located 3 = Parameter errors or incorrect calling environment Usage could be implemented in the calling program as: %obsCount(data=mast.rand, rc=randRC) %if &randRC. ^= 0 %then %do; <error-handling code goes here> %end;
7. Control Macro Variable Scope Recall Scenario 1 – macro variable was defined in a macro, then rewritten in a called macro. Error was due to lack of attention to the macro variable's scope – local v. global. Let's look at another example: %macro outer(list=); %let upper = %sysfunc(count(&list., %str( )); %do i = 1 %to &upper.; /* i = 1, 2, 3, … */ %let token = %scan(&list., &i.); %print(data=&token.) %end; /* i now replaced by value in %print */ %mend; %macro print(data=); %let i = %index(&data., .); /* 0 <= i <= 9 */ %if &i. = 0 %then %let data = work.&data..; … other statements not shown … proc print data=&data.; title "&data."; run; %mend;
7. Control Macro Variable Scope (cont.) Solution 1: Make local copies of I – use %local statement %macro outer(list=); %local i; %let upper = %sysfunc(count(&list., %str( )); %do i = 1 %to &upper.; /* i = 1, 2, 3, … */ %let token = %scan(&list., &i.); %print(data=&token.) %end; /* i unaffected by activity in %print */ %mend; %macro print(data=); %local i; %let i = %index(&data., .); /* 0 <= i <= 9 */ %if &i. = 0 %then %let data = work.&data..; … other statements not shown … proc print data=&data.; title "&data."; run; %mend;
7. Control Macro Variable Scope (cont.) Solution 2: Chose Unique Variable Names %macro outer(list=); %local OUTi; %let upper = %sysfunc(count(&list., %str( )); %do OUTi = 1 %to &upper.; /* OUTi = 1, 2, … */ %let token = %scan(&list., &OUTi.); %print(data=&token.) %end; %mend; %macro print(data=); %local PRi; %let PRi = %index(&data., .); /* 0 <= PRi <=9 */ %if &PRi. = 0 %then %let data = work.&data..; … other statements not shown … proc print data=&data.; title "&data."; run; %mend; Naming conventions + %local = no surprises
8. Implement Diagnostic & Debugging Code Fixes and enhancements can be easier by building in debugging parameters and using revision codes. Debugging parameters Allow display of extra messages and/or procedure output. Useful during development and when the macro misbehaves. %macro rpt(debug=0, other parameters); %if &debug. > 0 %then %do; %put Global macro variables at start of execution:; %put _global_; %end; %if &debug. >= 2 %then %do; proc freq data=_TMP_2; tables grp1-grp&n.; title "Grouping variables from transposed master data"; run; %end;
8. Implement Diagnostic & Debugging Code Revision Codes Maintenance and enhancements need to be tracked. Revision codes are a way to do this. They identify the nature of the change and where it was made. Consider the "revision codeless" approach in this header comment: History: 2007/11/23 FCD Initial version in autocall library 2007/12/22 FCD Add handling for empty datasets
8. Implement Diagnostic & Debugging Code Revision Codes (cont.) Nice try, but where, exactly, was the change made? Do two things. First, add a revision code number to the comment: History: 2007/11/23 FCD Initial version in autocall library 2007/12/22 FCD [U01] Add handling for empty datasets Second, add comments in the program where we made the changes: %dsetCount(data=pass1, count=np1) %if &np1. < 1 %then %do; /* [U01] */ %put First filtered pass resulted in an empty data set; %put Execution terminating.; %goto term; %end; /* [U01] end */
9. Use Built-In Macro Tools Don't reinvent the macro "wheel" Be aware of: • Automatic macro variables • Autocall libraries • Macro functions • Macro-related options
9. Use Built-In Macro Tools (cont.) Automatic macro variables Reduce code volume: data _null_; /* Without automatic variables */ x = today(); call symput('date', put(x, date9.)); run; footnote "Run date: &date."; footnote "Run date: &sysdate9."; Make the difficult possible: %macro iso(date=, out=); … statements not shown … array dtParts&sysindex.(6); … statements not shown … %mend data ae: set clin.ae; %iso(date=onset) %iso(date=term) run;
9. Use Built-In Macro Tools (cont.) Autocall Libraries Eliminate clumsy %INCLUDEs, and provide a clean, powerful way to make macro libraries available. options sasautos=('path1', 'path2', sasautos) mautosource mrecall; Functions As in the rest of SAS (or any) software, functions reduce code volume, and are more efficient than manually-coded solutions. Use autocall'ed functions and those available via %SYSFUNC. For example: %if %sysfunc(exist(master.DM)) %then %do; … DATA step, REPORT … %end; %else %put Dataset MASTER.DM not found;
9. Use Built-In Macro Tools (cont.) Macro-related options Especially useful for debugging (MPRINT, MAUTOLOCDISPLAY, etc.) Can be used as part of a macro's debugging option: %macro rpt(debug=0, other parameters); %local opts; %if &debug. > 0 %then %do; %let opts = options %sysfunc(getoption(mprint)) %sysfunc(getoption(mautolocdisplay)) ; options mprint mautolocdisplay; … other DEBUG > 0 actions … %end; /* termination section: re-set */ &opts.;
10. Build the Other Tools You Need Lots of options, lots of functions, but not many built-in diagnostic tools. So build your own … Consider %put _global_; output: no obvious order or coherent order. Write a short macro - %printMacvars – that uses Dictionary Tables and PUT statements to clearly and logically display the variables. Code (bare-bones, without error-checking or comments) begins on next slide.
10. Build the Other Tools You Need (cont.) %macro printMacvars; %local _opts; %let _opts = %sysfunc(getoption(mprint)) %sysfunc(getoption(notes)); options nomprint nonotes; proc sql noprint; create table _macvars_ as select * from dictionary.macros where offset=0 and scope='GLOBAL' order by name; quit; %if &SQLobs. = 0 %then %do; %put AllMacVars-> No global macro variables; %goto bottom; %end; data _null_; set _macvars_ end=eof; if _n_ = 1 then put / 'Macro Variable' @34 'First 50 Characters' / 32*'=' +1 50*'=' ; put name $33. value $char50.; if eof then put 32*'=' +1 50*'=' / "# vars = " _n_ / 83*'=' ; run; %bottom: proc delete data=_macvars_; run; options &_opts.; %mend printMacvars;
10. Build the Other Tools You Need (cont.) Run the macro and get the following output in the SAS Log: Macro Variable First 50 Characters ================================ ============================================ GLOBAL1 G1 NYLIST NO YES NYLISTN 2 NYLISTQ "NO" "YES" TESTMACVAR tmv ================================ ============================================ # of vars = 5 ============================================================================= Not compelling stuff when we have only five variables, but consider the benefits when you have many!
11. Adopt the Software Development Mindset Macro-based utilities and applications expand in scope over time: add new parameters, add new users, react to new data (variables, datasets) and different storage (Oracle v. MDB). You've become a software developer. Among the many aspects of this mindset: Separation of Development and Production code Validation Versioning
12. Know When It’s Time For A Rewrite Sometimes, especially with legacy code, you end up feeling like this: Your best option may be to rewrite the macro. Let’s look at some of the rewrite scenarios ...
12. Know When It’s Time For A Rewrite (cont.) Signs that it’s time for a rewrite: Persistent cussing Proliferation of temporary (internal) macros Use of deeply-nested variable references (e.g., &&&&temp) Excessive size relative to functionality Lack of calls to utility macros Unreasonable/increasing number of hard codes High enhancement coding effort relative to desired functionality Feelings of despair and dread when the macro opens in your text editor
An Extended Example (1 of 7) %attrDiff – identify like-named variables with conflicting attributes. /* attrDriff Function: Identify conflicting attributes of like-named variables in a library Input: Parameters (not case-sensitive) LIB LIBNAME of library to examine. REQUIRED. No default. COMP Attributes to compare. Specify any or all of these variable attributes: T - type S - length L - label OPTIONAL. Default is tsl OUT One or two-level name of output dataset. Cannot be in same library as specified by LIB parameter. REQUIRED. No default. MSG Write messages to Log? YES or NO OPTIONAL. Default=YES Brief description of what the macro does For each parameter: description, permitted values, and defaults.
An Extended Example (2 of 7) Output: Dataset specified by OUT parameter. Sort order is upper-cased name, dataSet. Variables: dataSet $ 32 Dataset name name $ 32 Variable name (upper-cased) type $ 4 Type (if COMP contained t) typeFlag 8 Differing TYPE (0 or 1) length 8 Length (if COMP contained s) lengthFlag 8 Differing LENGTH (0 or 1) label $255 Label (if COMP contained l) lablelFlag 8 Differing LABEL (0 or 1) The dataset is created with 0 observations if there are no attribute conflicts. The dataset is NOT created if there are parameter errors. Execution: Run in open code Example: %attrDiff(lib=clinical, compare=ts, out=probs) Compare type and length for datasets in library CLINICAL. Write output to dataset WORK.PROBS History: 2007-10-08 FCD Initial program */ Complete description of output dataset variables, sort order. Saying Open Code means we have to test for it (see next slide) Revision history (including revision codes) will go here.
An Extended Example (3 of 7) %macro attrDiff(lib=, compare=tsl, msg=yes, out=); /* ---------- Be sure we're running in open code ---------- */ %if &sysprocname. NE %then %do; %put attrDiff-> Must run in open code. Execution terminating.; %goto lastStmt; /* <<<< <<< << < <<<< <<< << < <<<< <<< << < */ %end; /* ---------- Housekeeping and initial messages ---------- */ %local opts star; %let opts = %sysfunc(getoption(mprint)) %sysfunc(getoption(notes)); %if &msg. = NO %then %let star = *; options nomprint nonotes; %&star.put; %&star.put attrDiff-> Begin. Examine library [&lib.] compare [&compare.] create [&out.]; /* ---------- Upper case some parameters ---------- */ %let lib = %upcase(&lib.); %let compare = %upcase(&compare.); %let msg = %upcase(&msg.); Keyword parameters! If not in open code, write message and terminate. &STAR is either null or *. This influences behavior of %&star.put throughout the program (toggles messages). Initialization: %local variables, get/change option values, write message to Log.
An Extended Example (4 of 7) /* ---------- Check for parameter errors ---------- */ %local ok outLib; %if &lib. = %then %do; %let ok = f; %put attrDiff-> LIB cannot be null; %end; %else %if %sysfunc(libref(&lib.)) ^= 0 %then %do; %let ok = f; %put attrDiff-> Input LIBNAME [&lib.] not found.; %end; %if &out. = %then %do; %let ok = f; %put attrDiff-> OUT cannot be null; %end; %else %do; %if %index(&out., .) %then %let outLIB = %upcase(%scan(&out., 1, .)); %else %let outLIB = WORK; %if &outLIB. = &lib. %then %do; %let ok = f; %put attrDiff-> OUT and LIB libraries cannot be identical; %end; %else %if %sysfunc(libref(&outLIB.)) ^= 0 %then %do; %let ok = f; %put attrDiff->Output LIBNAME [&outLIB.] not found.; %end; %end; Set local variable OK to f if any error condition is true.
An Extended Example (5 of 7) %end; %if &compare. = %then %do; %let ok = f; %put attrDiff-> COMPARE cannot be null; %end; %else %if %sysfunc(verify(&compare., TSL)) > 0 %then %do; %let ok = f; %put attrDiff-> COMPARE can only have T S or L; %end; %if %sysfunc(indexW(NO YES, &msg.)) = 0 %then %do; %let ok = f; %put attrDiff-> MSG is only YES or NO. Found [&msg.]; %end; /* If anything was amiss, print a message and branch to bottom */ %if &ok. = f %then %do; %put attrDiff-> Execution terminating due to error(s) noted above; %put attrDiff-> Output dataset [&out.] will NOT be created; %goto bottom; /* <<<< <<< << < <<<< <<< << < <<<< <<< << < */ %end; /* ---------- Create SQL statement fragments based on COMPARE value */ %local sumOps tf sf lf; %if %index(&compare., T) %then %do; %let tf = type, count(distinct type) > 1 as typeFlag, ; %let sumOps = , typeFlag; %end; If any errors, print message and go to termination section. Based on COMPARE parameter value, build pieces of SQL statement used to select obs from dictionary table.
An Extended Example (6 of 7) %if %index(&compare., S) %then %do; %let sf = length, count(distinct length) > 1 as lengthFlag, ; %let sumOps = &sumOps., lengthFlag; %end; %if %index(&compare., L) %then %do; %let lf = label, (count(distinct label) > 1 | (count(distinct label) = 1 & sum(missing(label) > 0))) as labelFlag; %let sumOps = &sumOps., labelFlag; %end; /* ---------- Build the dataset ---------- */ proc sql noprint; create table &out. as select &tf. &sf. &lf., upcase(name) as name, memname as dataSet from dictionary.columns where catt(libname, memType) = "&lib.DATA" group by name having sum(0 &sumOps.) > 0 order by name, dataSet ; %&star.put attrDiff-> &SQLobs. variables with mismatches.; quit; Based on COMPARE parameter value, build pieces of SQL statement used to select obs from dictionary table. Create the output dataset and report how many variables were problematic.
An Extended Example (7 of 7) %bottom: %&star.put attrDiff-> Done.; %&star.put; /* ---------- Revert to orginal MPRINT and NOTES values */ options &opts.; %lastStmt: %mend attrDiff; Termination section may produce a message and will always reset options altered at top of the program.
Thanks for Coming! Address questions and comments to: Frank DiIorio Frank@CodeCraftersInc.com This paper is available at www.CodeCraftersInc.com