150 likes | 285 Views
Combining Lags and Arrays [and a little macro]. Lynn Lethbridge SHRUG OCT 28, 2011. Lag Function. A tool to manipulate data across observations Can lag back as far as you like (default is a lag of 1) Can be frustrating!
E N D
Combining Lags and Arrays [and a little macro] Lynn Lethbridge SHRUG OCT 28, 2011
Lag Function • A tool to manipulate data across observations • Can lag back as far as you like (default is a lag of 1) • Can be frustrating! • Don’t put a lag in a conditional statement; create a variable with the lag value first and use the variable in the conditional statement
Example 1a data work.temp; input x sex $; lx=lag(x); if sex='M' then z=lx; drop lx; cards; 1 M 1 M 3 F 5 M 5 F 8 F 8 F 8 F 9 F 10 M ; run; Obs x sex z 1 1 M . 2 1 M 1 3 3 F . 4 5 M 3 5 5 F . 6 8 F . 7 8 F . 8 8 F . 9 9 F . 10 10 M 9
Example 1b data work.temp; input x sex $; if sex='M' then z=lag(x); cards; 1 M 1 M 3 F 5 M 5 F 8 F 8 F 8 F 9 F 10 M ; run; Obs x sex z 1 1 M . 2 1 M 1 3 3 F . 4 5 M 1 5 5 F . 6 8 F . 7 8 F . 8 8 F . 9 9 F . 10 10 M 5
Arrays Allows you to assign a single name to multiple variables Can use an array to manipulate data across variables without having to write them out multiple times Can create new variables in arrays
data work.temp; input id sex $ a b c; array oldvars {*} a b c; array newvars {*} var1-var3; do i=1 to dim(oldvars); if sex='f' then newvars{i}=oldvars{i}; end; drop i; cards; 1 m 1 3 4 1 m 2 2 4 3 m 6 2 6 3 f 7 1 5 4 m 8 4 7 4 f 1 4 8 5 f 4 9 5 6 m 3 8 9 ; run;
Obs id sex a b c var1 var2 var3 1 1 m 1 3 4 . . . 2 1 m 2 2 4 . . . 3 3 m 6 2 6 . . . 4 3 f 7 1 5 7 1 5 5 4 m 8 4 7 . . . 6 4 f 1 4 8 1 4 8 7 5 f 4 9 5 4 9 5 8 6 m 3 8 9 . . .
Lags and Arrays • Lags allow you to work across observations • Arrays allow you to work across variables • Combining the two lets you work efficiently with your data as a matrix
Combined Example • Suppose you have multiple observations per ID • Ordered first by ID and then by date • You want to know if a code appears twice in a row for any individual ID across multiple variables
Data data work.temp; input id date a $ b $ c $; cards; 1 12951 a1 b2 g5 1 12999 a4 a1 v4 1 13777 s3 f5 g5 3 11111 a1 a1 g5 3 12222 h6 i8 t5 3 13333 b2 f4 f4 3 14444 d4 f4 s3 4 10000 c1 g4 d4 4 11111 f4 d4 s3 5 9900 d1 s2 w2 5 10000 a1 s2 d1 5 13456 a1 a2 a1 5 14566 f4 f3 f4 5 15999 d1 a1 a1 ; run;
Lags and Arrays data work.temp2; set work.temp; lagid=lag(id); array vars {*} a b c; array newvars {*} $ na nb nc; do i=1 to 3; lagvars=lag(vars{i}); if id=lagid and lagvars=vars{i} then newvars{i}=vars{i}; end; drop lagid i lagvars; run;
Printed Output Obs id date a b c na nb nc 1 1 17JUN95 a1 b2 g5 2 1 04AUG95 a4 a1 v4 3 1 20SEP97 s3 f5 g5 4 3 03JUN90 a1 a1 g5 5 3 18JUN93 h6 i8 t5 6 3 03JUL96 b2 f4 f4 7 3 19JUL99 d4 f4 s3 f4 8 4 19MAY87 c1 g4 d4 9 4 03JUN90 f4 d4 s3 10 5 08FEB87 d1 s2 w2 11 5 19MAY87 a1 s2 d1 s2 12 5 03NOV96 a1 a2 a1 a1 13 5 18NOV99 f4 f3 f4 14 5 21OCT03 d1 a1 a1
Higher Order Lag data work.temp; input id a ; la3=lag3(a); cards; 1 1 1 2 3 3 3 4 4 5 4 6 5 7 6 8 ; run; Obs a la3 1 1 . 2 2 . 3 3 . 4 4 1 5 5 2 6 6 3 7 7 4 8 8 5
Higher Order Lag with a Macro %let num=3; data work.temp; input id a ; la&num=lag&num(a); cards; 1 1 1 2 3 3 3 4 4 5 4 6 5 7 6 8 ; run; Obs a la3 1 1 . 2 2 . 3 3 . 4 4 1 5 5 2 6 6 3 7 7 4 8 8 5