1 / 10

SET statement in DATA step

SET statement in DATA step. Based on S. David Riba’s The Set statement and beyond: Uses and Abuses of the SET statement. Simple SET statement. *Simple Set statement; data temp1b; set temp1; run; *Concatenate; data temp1x3; set temp1 temp1 temp1 ; run; *Interleave; data temp12a;

marly
Download Presentation

SET statement in DATA step

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SET statement in DATA step Based on S. David Riba’s The Set statement and beyond: Uses and Abuses of the SET statement

  2. Simple SET statement *Simple Set statement; data temp1b; set temp1; run; *Concatenate; data temp1x3; set temp1 temp1temp1; run; *Interleave; data temp12a; set temp1 temp2;by i; run; *Combine; data temp12b; set temp1; set temp2; run; *Attach first observation of temp1 to all observations of temp2; data temp12c; if (_n_ eq 1) then set temp1; set temp2; run;

  3. Data set options Inside SET statement /*Data step options in SET Statement DROP = varlist KEEP = varlist FIRSTOBS = num IN = var OBS = num RENAME = varlist WHERE = condition */ *Combine data with itself to calculate change in a variables' value; data temp3; set temp1 ( keep = x ) ; set temp1 ( firstobs = 2 rename = ( x=frwx ) ); delta = x-frwx; run; *The IN = data set option is used with multiple data sets where it is important to know which data set contributed an observation; data temp4 ; set temp1 ( in = in_1 ) temp2 ( in = in_2 ) ;by i; if ( in_1 ) then x2=x**2 ; else if ( in_2 ) then yexp=exp(y); run; DATA temp5 ; Set temp1 ( where = ( x>.5 ) ) temp2 ( where = ( y<.5 ) ) ; run;

  4. SET statement options /*SET statement OPTIONS END = var KEY = index NOBS = var POINT = var */ *END statement; *The END = option is used to identify the last observation processed by a SET statement.; data temp6; set temp1 end = eof ; set temp2; if ( eof ) then do ; lx=x; ly=y; end; run;

  5. *KEY statement; *The KEY = option retrieves observations from an indexed data set based on the index key, which can be either a simple key or a composite key; data pan1; do i=1 to 30; k=i; x=rand('unif'); output; end; run; data pan2(index=(k)); do i=1 to 20; k=i+10; y=rand('unif'); output; end; run; data pan3 ; set pan1; set pan2 key = k; xymax=max(x,y); run;

  6. *NOBS statement; *The NOBS = option creates a variable which contains the total number of observations in the input data set(s). If multiple data sets are listed in the SET statement, the value in the NOBS = variable are the total number of observations in all the listed data sets.; *use a data set if nobs is what you want; data temp7; if (0) then set temp1 ( drop=_ALL_ ) temp2 ( drop=_ALL_ ) nobs=totobs; if ( totobs ) then set temp1 temp2; else abort ; run; *just figure out the nobs of your data set; data _null_; call symput( 'n_obs' , put ( n_obs, 5. ) ) ; stop; set temp1 temp2 nobs = n_obs; run; %put &n_obs;

  7. *POINT statement; *The POINT = option uses a numeric variable for direct (or random) access into a SAS data set. The value of the POINT = variable must be specified before it can be used.; *use the third observation; data temp8; ptr = 3; set temp1 point = ptr ; if ( _error_ ) then abort ; output; stop; run; *reverse the order of your data; data temp9; do ptr = lastrec to 1 by -1 ; set temp1 point = ptr nobs = lastrec ; if ( _error_ ) then abort ; output; end; stop; run;

  8. *Random replicates of data set; data john1; do i = 1 to 20; x=rand('unif'); output; end; run; data john2; do _i_ = 1 to 10; ptr = ceil ( totobs * ranuni ( totobs ) ) ; set john1 point = ptr nobs = totobs ; if ( _error_ ) then abort ; output; end; stop; run;

  9. *Replicates of observations; data kevin1; do i=1 to 10; start=1; stop=i; output; end; run; data kevin2; do i=1 to 10; x=rand('unif'); output; end; run; data kevin3; set kevin1; do ptr = start to stop ; set kevin2 point = ptr ; if ( _error_ ) then abort ; output; end; run;

  10. *Input Min, Max, Sum etc. in your data set; data voytek; retain minval maxval sumval ; if ( _N_ eq 1 ) then do until (lastrec) ; set temp1 (keep = x) end = lastrec; minval = min ( minval, x ) ; maxval = max ( maxval, x ) ; sumval = sum ( sumval, x ) ; end; set temp1 ; run;

More Related