610 likes | 1.02k Views
“LAG with a WHERE” and other DATA Step Stories. Neil Howard A. Table of Contents. Chapter 1 “LAG with a WHERE” Chapter 2 “A DIF ferent LAG” Chapter 3 “To LAG or to LEAD” Chapter 4 “When RETAIN Doesn’t Retain” Chapter 5 “Don’t Order My Variables Around”
E N D
“LAGwith a WHERE”and other DATA Step Stories Neil Howard A
Table of Contents • Chapter 1“LAG with a WHERE” • Chapter 2“A DIFferent LAG” • Chapter 3“To LAG or to LEAD” • Chapter 4“When RETAIN Doesn’t Retain” • Chapter 5“Don’t Order My Variables Around” • Chapter 6“The Case of the Missing Values”
Chapter One LAG with a WHERE
“LAG with a WHERE” • an interesting fairy tale • must first understand: • LAG function • WHERE statement • implications of conditional execution
LAG function • Syntax: • nspecifies number of lagged values • argument is numeric or character LAG<n>(argument)
LAG function • LAG functions return values from a queue • A LAGn function stores a value in a queue and returns a value stored previously in that queue • Each occurrence of a LAGnfunction generates its own queue • n is the length of the queue
LAG function • LAG function is executable • LAG function can be conditionally executed • NOTE: storing and returning values from the queue occurs only when the function is executed
SIMPLE LAG data new; input x @@; lag1=lag1(x); lag2=lag2(x); cards; 1 2 3 4 5 6 ;
(Note initialization to missing) X LAG1 LAG2 1 . . 2 1 . 3 2 1 4 3 2 5 4 3 6 5 4
CONDITIONAL LAG data new; input a b @@; LAGa = LAG(a); if b=2 then LAGb=LAG(a); cards; 1 1 2 1 3 2 4 1 5 2 6 1 ;
A B LAGA LAGB 1 1 . . 2 1 1 . 3 2 2 . 4 1 3 . 5 2 4 3 6 1 5 .
Every other lagged value ? data new; input x @@; * conditional; if mod(x,2)=0 then condLAG1 = lag(x); LAGx=lag(x); * unconditional; if mod(x,2)=0 then condLAG2 = LAGx; cards; 1 2 3 4 5 6 7 8 ;
right answer X LAGx condLAG1 condLAG21 . . . 2 1 . 1 3 2 . . 4 3 2 3 5 4 . . 6 5 4 5 7 6 . . 8 7 6 7
WHERE statement • Selects observations before they’re brought into the LPDV • After data set options applied • Before any other data step statements executed, including SET, BY, etc. • Functions differently with BY and first. and last. • Only works w/ SAS data (not raw data)
Given this data: VISIT WEIGHT 01JAN2003 88 02JAN2003 22 03JAN2003 154 04JAN2003 21 05JAN2003 112 CUTOFF
WHERE Subsetting IF data w ; set q ; lagwgt = lag(weight) ; where visit>"01jan2003"d ; run ; data w ; set q ; lagwgt = lag(weight) ; if visit > "01jan2003"d ; run ; • DIFFERENCE? • WHERE will not pick up first lagged value • subsetting IF will…
Output from WHERE VISIT WEIGHT LAGWGT 02JAN2003 22 . 03JAN2003 154 22 04JAN2003 21 154 05JAN2003 112 21
Output from subsetting IF VISIT WEIGHT LAGWGT 02JAN2003 22 88 03JAN2003 154 22 04JAN2003 21 154 05JAN2003 112 21
Chapter Two A DIFferent LAG
“A DIFferent LAG” • DIF function • Syntax: • n specifies number of lags • argument is numeric DIF<n>(argument)
DIF function • DIF function returns the first difference between the argument and its nth lag. • Defined as: DIF(X) = X - LAGn(X) ;
DIF function • Same storing/returning from LAGn queues apply • Same caveats for conditional execution
data new; input x @@; lagx = lag(x); difx = dif(x); cards; 1 2 8 4 3 9 7 ;
x lagx difx1 . . 2 - 1 = 1 8 2 6 4 8 -4 3 4 -1 9 3 6 7 9 -2
Chapter Three To LAG or to LEAD
Is there a LEAD function? • No LEAD function or negative LAG • Several solutions at: • www.sconsig.com • Including: • Sort in descending order (reverse) • …then use the LAG function
Most elegant solution: • MERGE the data set with itself • Read the data set twice • Using a 1:1 MERGE • No BY statement • Using firstobs=2
data lagged ; merge master ( keep = var ) master ( firstobs = 2 rename = (var =nextvar ) ) ; **** no BY statement ; run;
Chapter Four When RETAIN Doesn’t Retain
Retained Variables • all SAS special variables, e.g. • _N_ • _ERROR_ • all vars in RETAIN statement • all vars from SET or MERGE • accumulator vars in SUM stmt
Variables Not Retained • Variables from INPUT statement • User-defined variables/ vars created in DATA step • UNLESS……what?
concatenation data A ; input id $ site $; cards; 10212 00 10213 00 ; dataB; input id $; cards; 02001 03005 06900 ; data c; setA B; if missing(site) then site = substr(id,1,2); run;
? ?
Solution data C; setA B(in=inb); if inb then site = substr(id,1,2); run; test that the observation has come from B and only then extract the site value....
Chapter Five Don’t Order My Variables Around “the variable order is not always declared where it seems to occur…” Ron Fehd
Question posed: How do I reorder the variables in my SAS data set?
“Don’t Order My Variables Around” • WHY? • exporting / export wizard • SAS Viewer end users • manipulate groups/lists of vars (age - - diag) • with PUT or ARRAY • what else?
“Don’t Order My Variables Around” • storage: • in LPDV • in SAS data set • presentation layer
My question to you: What forces the order of the variables in a SAS data set in the first place? The order in which they are seen by the compiler when the data set is created.
“Don’t Order My Variables Around” • RETAIN statement • (ATTRIB statement) • (LENGTH statement) • (PROC TRANSPOSE) • ??????
“Don’t Order My Variables Around” • Why RETAIN? • retain functionality implicit for vars coming from SET or MERGE • Nothing you can mess up (attributes, etc.)!
Original CONTENTS PROCEDURE -----Variables Ordered by Position----- # Variable Type Len Pos 1 NAME Char 8 0 2 SEX Char 8 8 3 AGE Num 8 16 4 ID Num 8 24 5 RX_GRP Num 8 32
Original NAME SEX AGE ID RX_GRP John M 35 101 2 Dan M 53 206 1 Howard M 45 321 3
data new; retain id rx_grp name sex age; *** 1st reference to compiler; set master; run;
Reordered CONTENTS PROCEDURE -----Variables Ordered by Position----- # Variable Type Len Pos 1 ID Num 8 0 2 RX_GRP Num 8 8 3 NAME Char 8 16 4 SEX Char 8 24 5 AGE Num 8 32
Reordered ID RX_GRP NAME SEX AGE 101 2 John M 35 206 1 Dan M 53 321 3 Howard M 45
Chapter Six The Case of the Missing Values
“How do MISSINGs compare?” • QUESTION: If A > B then<perform action>; If either A or B is missing, isn’t the statement just ignored? What if both are missing?