500 likes | 716 Views
SAS Programming Techniques for Decoding Variables on the Database Level By Chris Speck PAREXEL International RTSUG – Wednesday, March 23, 2011. Think “Library” Not Just “Dataset”. Libraries can be viewed as discrete units.
E N D
SAS Programming Techniques for Decoding Variables on the Database Level By Chris Speck PAREXEL International RTSUG – Wednesday, March 23, 2011
Think “Library” Not Just “Dataset” Libraries can be viewed as discrete units. May require a change in perspective for the beginning and intermediate programmer. Crucial now that programmers are under greater pressure to apply regulatory standards to clinical research data. Programmers must now deal with metadata on the library level, which can be especially difficult with legacy data. One tool for this is the Meta-Engine
The Set-Up You’re given a library of 55 datasets of varying quality from 2001 which you must convert to submission ready data. Many datasets have variables with different kinds of non-native SAS formats. • New variables equivalent to the decode of these formatted variables must be made. • All non-native formats need to be stripped. What are you going to do?
The Set-Up One solution is to make one program per dataset. Another is to create one massive program that updates datasets one at a time. Some of the obvious flaws to this approach include • Specific only to one project • Involves much unnecessary rework • Disorganized • Difficult to debug
What is the Meta-Engine? SAS program that manipulates metadata of entire libraries. • Portable • Streamlined • Easy to understand and debug Loops through a library one dataset at a time to make quick and uniform changes. Relies much on the • SAS macro facility • Dictionary tables • Proc format
Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable Meta-Engine Structure
Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable %DO Looping Mechanism Uses PROC SQL and Dictionary Tables. Meta-Engine Structure %DO Looping Mechanism
Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable %DO Looping Mechanism Uses PROC SQL and Dictionary Tables. Meta-Engine Structure Code to Adjust Dataset labels %DO Looping Mechanism
Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable %DO Looping Mechanism Uses PROC SQL and Dictionary Tables. Meta-Engine Structure Code to Adjust Dataset labels Code Calling the Decode() macro Does this once for every variable needing decoding. %DO Looping Mechanism
Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable %DO Looping Mechanism Uses PROC SQL and Dictionary Tables. Meta-Engine Structure Code to Adjust Dataset labels Code Calling the Decode() macro Does this once for every variable needing decoding. Code to make further changes %DO Looping Mechanism
Meta-Engine Overview • The Meta-Engine macro asks for two library names: The one that contains the existing database, and the one that will contain the corrected, submission ready database. • For example: %macro MetaEngine(lib=, outlib=); %mend MetaEngine; %MetaEngine(lib=MYLIB, outlib=MYNEWLIB); << All of programming performed >>
%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit;
%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~)
%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; %let i = %eval(1); %do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~); %let i = %eval(&i+1); %end; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~)
%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; %let i = %eval(1); %do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~); %let i = %eval(&i+1); %end; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~) Macro variable THISDS will represent a dataset name for every loop iteration
%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; %let i = %eval(1); %do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~); %let i = %eval(&i+1); %end; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~) Macro variable THISDS will represent a dataset name for every loop iteration Loops till you run out of tildes
%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; %let i = %eval(1); %do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~); %let i = %eval(&i+1); %end; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~) << Bulk of programming >> Macro variable THISDS will represent a dataset name for every loop iteration Loops till you run out of tildes
Adjusting Dataset Labels %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel);
Adjusting Dataset Labels Data Steps won’t save them %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel);
Adjusting Dataset Labels Data Steps won’t save them %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel); Resetting label macro variable before each loop iteration
Adjusting Dataset Labels Data Steps won’t save them %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel); Resetting label macro variable before each loop iteration Dictionary Table assigns label of THISDS to macro variable DSLABEL. Used to assign dataset label to final dataset.
Adjusting Dataset Labels Data Steps won’t save them %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel); %if &thisds=DATA1 %then %let dslabel=Label for DATA1; %else %if &thisds=DATA2 %then %let dslabel=Label for DATA2; Resetting label macro variable before each loop iteration Dictionary Table assigns label of THISDS to macro variable DSLABEL. Used to assign dataset label to final dataset. In case you want to manually adjust dataset labels (not in paper)
Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end;
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end;
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Will equal list of all decoded variables for later processing
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Will equal list of all decoded variables for later processing DECODE MACRO Parameters:
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Will equal list of all decoded variables for later processing DECODE MACRO Parameters: Format library
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Will equal list of all decoded variables for later processing DECODE MACRO Parameters: Format library Input dataset
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Will equal list of all decoded variables for later processing DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable)
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Will equal list of all decoded variables for later processing DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable) Variable to be decoded
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Will equal list of all decoded variables for later processing DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable) Variable to be decoded Decode variable name
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Will equal list of all decoded variables for later processing DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable) Variable to be decoded Decode variable name Builds decode list
Calling the Decode Macro Base dataset equal to THISDS data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Will equal list of all decoded variables for later processing DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable) Variable to be decoded Decode variable name Builds decode list Used DS0_00 numbering scheme with NEWDS parameter because it will be the parameter DS in the next macro call, producing DS0_001. Final product should be DS1.
Calling the Decode Macro How it would appear in real code
Decode Macro 1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable. proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds") and name="&var"; quit;
Decode Macro 1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable. proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds") and name="&var"; quit; proc format noprint cntlout=fmt (keep=length) library=%upcase(&fmtlib) fmtlib; select %substr(&fmt,1,%length(&fmt)-1); run; 2. Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. LENGTH is max length of format value.
Decode Macro 1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable. proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds") and name="&var"; quit; proc format noprint cntlout=fmt (keep=length) library=%upcase(&fmtlib) fmtlib; select %substr(&fmt,1,%length(&fmt)-1); run; Why do we use a substring function here? Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. 2. Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. LENGTH is max length of format value.
Decode Macro 1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable. proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds") and name="&var"; quit; proc format noprint cntlout=fmt (keep=length) library=%upcase(&fmtlib) fmtlib; select %substr(&fmt,1,%length(&fmt)-1); run; Why do we use a substring function here? To remove the trailing period Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. 2. Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. LENGTH is max length of format value.
Decode Macro data _null_; set fmt; if _n_=1 then call symput('len',cats(put(length,best.))); run; 2.5. Assigns max length of format to &LEN to prevent truncation.
Decode Macro data _null_; set &ds; if _n_=1 then do; if length(vlabel(&var))<=38 then call symput('newlabel',cats(vlabel(&var))||"-C"); else if length(vlabel(&var))=39 then call symput('newlabel',cats(vlabel(&var))||"C"); else if length(vlabel(&var))>=40 then call symput('newlabel',substr(cats(vlabel(&var)), 1,length(vlabel(&var))-1)||"C"); end; run; 3. Retrieves variable label with VLABEL
Decode Macro data _null_; set &ds; if _n_=1 then do; if length(vlabel(&var))<=38 then call symput('newlabel',cats(vlabel(&var))||"-C"); else if length(vlabel(&var))=39 then call symput('newlabel',cats(vlabel(&var))||"C"); else if length(vlabel(&var))>=40 then call symput('newlabel',substr(cats(vlabel(&var)), 1,length(vlabel(&var))-1)||"C"); end; run; 3. Retrieves variable label with VLABEL Adjusts label so decode variables will have unique labels. Truncates label if >40 characters. Assigns to macro variable &NEWLABEL
Decode Macro data &newds; length &newvar $&len; set &ds; &newvar=put(&var,&fmt); label &newvar="&newlabel"; run; 4. Creates output dataset (&NEWDS). Derives decode variable (&NEWVAR) with variable format (&FMT). Assigns it a length (&LEN) and a label (&NEWLABEL).
Decode Macro data &newds; length &newvar $&len; set &ds; &newvar=put(&var,&fmt); label &newvar="&newlabel"; run; proc datasets nolist lib=work memtype=data; delete fmt &ds; run; quit; 4. Creates output dataset (&NEWDS). Derives decode variable (&NEWVAR) with variable format (&FMT). Assigns it a length (&LEN) and a label (&NEWLABEL). 5. Garbage collection
Further Adjustments data ds2; set ds1; %if &thisds=DATA3 %then %do; label D3VAR1C="Trunc. decode label which was too long-C"; %end; run; • For further information see my previous paper SAS Programming Techniques for Adjusting Metadata on the Database Level.
Completing Database Loop data &outlib..&thisds %if %length(&dslabel)>0 %then (label="&dslabel");; set ds2; format &fv; run; • Creates final dataset in output library • Assigns dataset label • DS2 could be DS3 or any number depending on the adjustments performed. • Strips formats off of decoded variables. • Process repeats for every dataset in library.
Other Possibilities • Possible to automate variable decoding. • PROC SQL produces list of variables with non-native formats. • List informs inner loop calling %DECODE() once for each variable. • A gain in automation, a loss in adaptability. • Not all formats exist in the same catalog. • Not all variable names may be 8 characters long. • Not all formatted variables may require decodes.
Other Possibilities • The Meta-Engine can be tweaked depending on the task. Some ideas include: • Testing libraries for SAS 5 compliance • Excluding certain datasets • Renaming datasets • Splitting a dataset into two if it takes up too much memory. • Adjusting dataset and variable metadata. • See my previous paper SAS Programming Techniques for Adjusting Metadata on the Database Level.
Conclusion The Meta-Engine offers a quick and streamlined approach for a programmer to begin thinking about metadata on the library or database level. Programmers can begin to manipulate whole libraries intuitively as if they were datasets. The Meta-Engine in its entirety plus further information can be found in my paper SAS Programming Techniques for Decoding Variables on the Database Level.
Contact Information Chris Speck, Senior Programmer PAREXEL International 2520 Meridian Parkway, Suite 200 Durham, NC 27713 Work Phone: 919 294 5018 Fax: 919 544 3698 chris.speck@parexel.com