500 likes | 665 Views
Using Proc IML. Statistical Computing Spring 2014. What is IML?. SAS vs R SAS: procedures (PROCs) and datasets R: functions/operations and matrices/vectors Proc IML IML = Interactive Matrix Language R-like programming inside of SAS Pros: more flexible
E N D
Using Proc IML Statistical Computing Spring 2014
What is IML? • SAS vs R • SAS: procedures (PROCs) and datasets • R: functions/operations and matrices/vectors • Proc IML • IML = Interactive Matrix Language • R-like programming inside of SAS • Pros: more flexible • Cons: programs are not validated • Applications • Simulate data • Matrix algebra (e.g. contrasts, algorithms) • Many things you could normally only do in R • Graphics
The Matrix • A matrix is a collection of numbers ordered by rows and columns. • Matrices are characterized by the number of rows and columns • The elements in a matrix are referred to first by their row then column
Special Matrices • A 1 x 1 matrix is also known as a scalar • r x 1 or 1 x c matrices are known as vectors • A diagonal matrix is a square matrix where the off-diagonal elements are zero • An identity matrix is a diagonal matrix where the diagonal elements are 1. These are also denoted by Ic, where c is the dimension of the matrix
Creating Matrices in IML PROCIML; A = 1; /* CREATE A SCALAR*/ B = {123}; /* CREATE A ROW VECTOR OF LENGTH 3*/ C = { 4, 5, 6}; /* CREATE A COLUMN VECTOR OF LENGTH 3*/ D ={ 12, 34, 5.}; /* CREATE A 3 BY 2 MATRIX WHERE THE 3,2 ELEMENT IS MISSING*/ PRINT A B C D; /* DISPLAY THE MATRICES IN THE OUTPUT*/ QUIT; *Can assign characters instead of numbers but matrix algebra won’t work
Manipulating Matrices • Using brackets inside the specification allows you to request repeats • A={ [2] ‘Yes’, [2] ‘No’} is equivalent to A={‘Yes’ ‘Yes’, ‘No’ ‘No’} • SAS: {[# Repeats] Value}, R: rep(value, number of times) • Select a single element • A={1 2, 3 4} • To select the number 3: A2=A[2,1] • Select a row or column • To select the first row: A3=A[1, ] • To select the first column: A4=A[ ,1] • Select a submatrix • B={1 2 0 0, 3 4 00} • To select the A matrix from within B: • A_new=B[1:2,1:2] or B[,{1 2} ]
Manipulating Matrices (cont.) • To define row and column labels, first create a vector with the labels • PRINT B[rowname=name label vector] • Can also use colname, format, and labels in this way • To permanently assign use mattribmatrix rowname= colname= • This then allows you to index using the matrix attributes (e.g. A[“True”,]) • Selecting elements with logical arguments • Instead of listing the specific elements use a logical argument • A=[1 2 3 4], B=A[loc(A>2)]=[3 4] • Replace elements • Option 1: reassign specific elements • A[2]=7 will yield A=[1 7 3 4] • Option 2: reassign by a rule • A[loc(A>2)]=0 will yield A=[1 2 0 0]
Manipulating Matrices in IML PROCIML; REPEAT_O1={[2]"YES" [2] "NO"}; /*USING THE REPEAT FUNCTION TO FILL THE MATRIX*/ REPEAT_O2={"YES""YES""NO""NO"}; /* REPEATING ELEMENTS MANUALLY*/ PRINT REPEAT_O1 REPEAT_O2; A={12, 34}; /* DEFINE MATRIX*/ A1=A[2,1]; /* SELECT THE ELEMENT IN THE 2ND ROW, FIRST COLUMN: A1 SOULD EQUAL 3 */ A2=A[1,]; /* SELECT THE FIRST ROW, A2 SHOULD EQUAL A 2 X 1 VECTOR {1 2} */ A3=A[,1]; /* SELECT THE FIRST COLUMN, A3 SHOULD EQUAL A 1 X 2 VECTOR {1,3} */ B={1200, 3400}; /* DEFINE A MATRIX B, WITH TWO SUBMATRICES A AND A 2 X 2 NULL MATRIX*/ A_NEW=B[1:2,1:2]; /* RECOVER THE A MATRIX FROM B */ A_NEW2=B[,{12}]; /*RECOVER THE A MATRIX FROM B, ANOTHER WAY TO WRITE IT*/ C_ROWNM={M F}; /* SET ROW NAMES FOR MATRIX C*/ C_COLNM={TRUE FALSE}; /* SET COL NAMES FOR MATRIX C*/ C={1025,918}; PRINT A A1 A2 A3 B A_NEW C[ROWNAME=C_ROWNM COLNAME=C_COLNM FORMAT=6.1 LABEL="MY MATRIX"] /*MODIFYING PRINTED OUTPUT FOR MATRIX C*/;
Manipulating Matrices in IML C_NEW=C; /* CREATING A DUPLICATE MATRIX*/ MATTRIB C_NEW ROWNAME=C_ROWNM COLNAME=C_COLNM FORMAT=6.1LABEL="MY MATRIX"; /* PERMANANTLY CHANGING OUTPUT FORMAT*/ PRINT C C_NEW; /* COMPARING DIFFERENT APPROACHES*/ D=A[LOC(A>1)];/* SELECTING ONLY ELEMENTS THAT MEET RULE, NOTE THAT MATRIX STRUCTURE NOT RETAINED*/ PRINT A D; E_TEMP=A; /* CREATING A DUPLICATE MATRIX*/ E_TEMP[1,1]=25/* CHANGING A SINGLE ELEMENT*/ PRINT E_TEMP; E_TEMP[LOC(E_TEMP>5)]=.; /* SETTING ALL ELEMENTS MEETING RULE TO MISSING*/ PRINT E_TEMP; QUIT;
Creating Special Matrices • Identity Matrix • I(r): Identity matrix of size r • Dummy Matrix • j(nrow,ncol,x) • nrow= number of rows, ncol=number of columns, x =fill value • Diagonal matrix • diag(vector) • diag(matrix) • Note you can also accomplish this by using a Kroeneker product ( @ ) for multiplying the desired matrix by an identity matrix
Creating Special Matrices • Block diagonal matrix • Block(M1, M2, …) • Repeat(matrix,nrow,ncol) • repeats the specified matrix for the number of rows and columns given • Shape(vector,nrow,ncol) • Repeats the given vector row-wise for the number of rows and number of columns given. Note that the number of cells to repeat must be a multiple of the vector length • Generate a sequence • Do(start,finish, by) creates a vector using the specified skip pattern. For example do(-1,0,0.5) would return [-1 -0.5 0]. • In R you can use seq(start, finish,by)
Matrix Addition and Subtraction • To add or subtract two matrices, they both must have the same number of rows and columns. • The addition or subtraction is element wise • Example:
Matrix Multiplication and Division • Scalar by Matrix multiplication and division is an element wise operation and commutative. • Multiplication of vectors and matrices • Not commutative (AB ≠ BA) • Requires that the number of columns in A equals the number of rows in B • The resulting matrix R will have dimension equal to rows of A and columns of B
Special Properties • Transpose: A’= (aji) • Inverse (indicated with -1 superscript): the inverse of a number is that number which, when multiplied by the original number, gives a product of 1 • Must be a square matrix
Matrix Algebra in IML PROCIML; *MATRIX ADDITION; A={13, 25}; /*DEFINE MATRIX*/ B={-52, 70}; /*DEFINE MATRIX*/ C=A+B; /* ADD A AND B*/ PRINT C; *MATRIX MULTIPLICATION; A={23,45}; /*DEFINE MATRIX*/ B={16,20}; /*DEFINE MATRIX*/ AB=A*B; /*MULTIPLY A BY B*/ BA=B*A; /* MULTIPLY B BY A*/ PRINT A B AB BA; /* NOTE THAT MULTIPLICATION IS NOT COMMUTATIVE, AB DOESN'T EQUAL BA*/ QUIT;
Matrix Operators: Comparison • Element wise comparison of matrices, result is a matrix of 0(False) and 1 (True) • Comparisons • Less than (<), less than or equal to (<=) • Greater than (>), greater than or equal to (>=) • Equal to (=), Not equal to (^=) • Can create compound arguments using logical functions • And (&) • Or ( |) • Not ( ^)
Solving Systems of Equations • Solve the following system of equations • When the problem is rewritten in terms of a matrix
Solving Systems of Equations (cont) • To solve, we can rearrange PROCIML; A={32 -4, 5-40, 0310}; B={11,9,42}; OPT1=SOLVE(A,B); OPT2=INV(A)*B; PRINT OPT1 OPT2; QUIT;
Opening a SAS Dataset • Before you can access a SAS dataset, you must first submit a command to open it. • To simply read from an existing data set, submit a USE statement. • USE<SAS Dataset> VAR <Variable Names> WHERE expression; • To read and write to an existing data set, use the EDIT statement. • In addition to READ you can also EDIT, DELETE, and PURGE observations from a dataset that has been opened using edit • Each dataset must only be opened once
Reading in Datasets • Create matrices from a SAS dataset • Create a vector for each variable • Create a matrix containing multiple variables • Select all observations or a subset • To transfer data from a SAS dataset to a matrix • SETIN • Specifies an open dataset as the current input dataset • READ • Transforms dataset into matrix READ <range> VAR operand <WHERE (expression)> INTO name; READ all VAR VAR1 WHERE VAR1>80 INTO MYMAT;
Sorting SAS Datasets • First close the dataset • SORT dataset out=new_dataset by var_name; • Can use the keyword DESCENDING to denote the alternative sort order
Creating Datasets from Matrices • When you create a dataset • Columns become variables • Rows become observations • CREATE • Opens a new SAS dataset for I/O • APPEND • Writes to the dataset • CREATESAS-data-set FROM matrix <[COLNAME=column-name ROWNAME=row name]> • CREATESAS-dataset VAR variable-names; APPEND FROM matrix-name;
Reading in SAS data with IML *CREATING A SAS DATASET TO WORK WITH; DATA MYDATA; SET SASHELP.CARS; RUN; PROCIML; USE MYDATA VAR {MSRP MPG_CITY MPG_HIGHWAY} ; /* OPEN DATASET*/ READ ALL VAR _ALL_ WHERE (MSRP<12000) INTO CAR_MAT; /* READ DATASET*/ Z=NROW(CAR_MAT); /* FIGURE OUT HOW MANY ROWS*/ PRINT Z CAR_MAT[COLNAME={MSRP CITY HWY}]; /* LOOK AT DATA*/ QUIT;
Subscript Operations • Commands that can be applied to obtain summary statistics on matrices • Select a single element, row, column, or submatrix • Similar to the APPLY function in R • SUMMARYproduces summary statistics on the numeric variables of a SAS data set. If you want them by subgroup use the CLASS option. • SUMMARY VAR {VARIABLE LIST}<CLASS (By Variables)> STAT (Desired stats) <OPT (SAVE)> • Reduction operators • Addition + • Multiplication # • Mean: • Sum of Squares ## • Maximum <> • Minimum >< • Index of maximum <:> • Index of minimum >:< • Additional Operators • Concatenation: Horizontal ||, Vertical // • Number of rows: nrow(matrix), Number of Columns: ncol(matrix)
Types of Statements • Control Statements • Direct the flow of execution • E.g. IF-THEN/ELSE statement • Functions and CALL statements • Perform special tasks or user-defined operations • Command statements • Perform special processing such as setting options, displaying windows, and handling input and output
IF-THEN/ELSE statements • IF expressionTHENstatement-one; ELSE statement-two; • IML processess the expression and uses this to decide whether statement one or statement two is executed. • You may also nest IF-THEN/ELSE Statements PROCIML; A={122233}; IFMAX(A)<20THEN P=1; ELSE P=0; PRINTP; QUIT;
DO groups • Several statements can be grouped together into a compound statement to be executed as a unit. • DO; Statements; END; • You can combine DO arguments with IF/ELSE • IF (X<Y) THEN DO; Z=X+Y; END; • ELSE DO; Z=X-Y; END; • The iterative DO <WHILE/UNTIL expression> repeats a set of statements over an number of times defined by the index. • If DO WHILE is used, the expression is evaluated at the beginning of each loop with iterations continuing until the expression is false. If the expression begins false the loop does not run. • If DO UNTIL is used the expression is evaluated at the end of the loop, this means that the loop will always execute at least once. PROCIML; Y=0; DO I=1TO3; Y=Y+1; PRINT Y; END; QUIT; PROCIML; COUNT=1; DO WHILE(COUNT<3); COUNT=COUNT+1; PRINT“WHILE"; END; COUNT=1; DO UNTIL(COUNT>3); COUNT=COUNT+1; PRINT“UNTIL"; END; QUIT;
Interacting with Procs • Option One • Write the data to a SAS data set by using the CREATE and APPEND statements • Use the SUBMIT statement to call a SAS procedure that analyzes the data • Read the results of the analysis into IML matrices using USE and READ statements • Option Two • Do what can only be done in IML • Write the data back out to a SAS dataset • Call PROCs normally • ODS TRACE ON;/ODS TRACE OFF; • Placed before and after a proc will print to the log the names of the various output. • Useful for requesting/saving specific parts of the analysis. • To use PROCs SUBMIT; Statements; END SUBMIT; • Like macros you can list variables already existing in IML that you would like to use in the proc. Then inside the submit command refer to these variables using &Varname • Substitutions take place before the block is processed so no macro variable is created • If you use SUBMIT *, you indicate a wildcard so that any of the existing variables can be referred • Any variable inside the submit block that is referenced (&var) but not created in the IML procedure does not get substituted. This is used for creating true macros.
Interacting with Procs PROCIML; Q={2579}; CREATEMYDATA VAR{Q}; APPEND; CLOSEMYDATA; *Table=“Moments”; SUBMIT; *SUBMIT table; PROCUNIVARIATEDATA=MYDATA; VARQ; ODSOUTPUT MOMENTS=MOMENTS; * ODS OUTPUT MOMENTS=&Table; RUN; ENDSUBMIT; USE MOMENTS; READ ALL VAR{NVALUE1 LABEL1}; CLOSE MOMENTS; LABL ="MY OUTPUT"; PRINT NVALUE1[ROWNAME=LABEL1 LABEL=LABL]; QUIT;
Modules • Modules are used for two purposes • To create user-defined subroutine or function. • To define variables that are local to the module. • START MODULE-NAME OPTIONS; STATEMENTS; FINISH MODULE-NAME; • To execute the module use • RUNMODULE-NAME; execute module first then subroutines • CALL MODULE_NAME; execute subroutines then modules • A function is a special type of module that only returns a specific value. • STARTMODULE; STATEMENTS; RETURN(VARIABLE); FINISH MODULE; • Any variables created inside the module but not mentioned in the return statement will not be retained for future use. • Possible to store and load modules (like a macro library or SOURCE in R) • STORE MODULE= MODULE NAME; • LOAD MODULE=MODULE NAME; • These will retain a program after IML has exited
Creating a Permanent Module Library • Permanent libraries maintain functions for multiple users. Equivilant to datasets stored in a permanent library vs. work folder LIBNAME LIBRARY ‘PATH’; PROC IML; START FUNC1(X); RETURN(X+1); FINISH; START FUNC2(X); RETURN(X**2); FINISH; RESET STORAGE=SOURCEFILE.LIBRARY; STORE MODULE=_ALL_; QUIT;
Calling R from within IML • Check to see if R has permission for your SAS • PROC OPTIONS OPTION=RLANG; • If not, you will have to add the –RLANG option to startup • Similar to calling procs • SUBMIT/R; ENDSUBMIT; • Export • ExportDataSetToR: SAS dataset ->R data frame • ExportMatrixtoR:IML Matrix->R Matrix • Import • IMPORTDATASETFROMR: R Expression ->SAS Dataset • IMPORTMATRIXFROMR : R Expression ->SAS MATRIX • R OBJECTS TEND TO BE COMPLEX SO YOU CAN ONLY TRANSFER SOMETHING THAT HAS BEEN COERCED TO DATA FRAME
SAS to R and back again proc iml; /* Comparison of matrix operations in IML and R */ print "---------- SAS/IML Results -----------------"; x = 1:3; /* vector of sequence 1,2,3 */ m = {1 2 3, 4 5 6, 7 8 9}; /* 3 x 3 matrix */ q = m * t(x); /* matrix multiplication */ print q; print "------------- R Results --------------------"; submit / R; rx <- matrix( 1:3, nrow=1) # vector of sequence 1,2,3 rm <- matrix( 1:9, nrow=3, byrow=TRUE) # 3 x 3 matrix rq <- rm %*% t(rx) # matrix multiplication print(rq) endsubmit; submit / R; hist(p, freq=FALSE) # histogram lines(est) # kde overlay endsubmit; proc iml; use Sashelp.Class; read all var {Weight Height}; close Sashelp.Class; /* send matrices to R */ call ExportMatrixToR(Weight, "w"); call ExportMatrixToR(Height, "h"); submit / R; Model <- lm(w ~ h, na.action="na.exclude") # a ParamEst <- coef(Model) # b Pred <- fitted(Model) Resid <- residuals(Model) endsubmit; call ImportMatrixFromR(pe, "ParamEst"); print pe[r={"Intercept" "Height"}]; ht = T( do(55, 70, 5) ); A = j(nrow(ht),1,1) || ht; pred_wt = A * pe; print ht pred_wt; YVar = "Weight"; XVar = "Height"; submit XVarYVar / R; Model <- lm(&YVar ~ &XVar, data=Class, na.action="na.exclude") print (Model$call) endsubmit;