260 likes | 479 Views
SAS Programming: Working With Variables. Data Step Manipulations. New variables should be created during a Data step Existing variables should be manipulated during a data step. Missing Values in SAS. SAS uses a period (.) to represent missing values in a SAS data set
E N D
Data Step Manipulations • New variables should be created during a Data step • Existing variables should be manipulated during a data step
Missing Values in SAS • SAS uses a period (.) to represent missing values in a SAS data set • Different SAS procedures and functions treat missing values differently - always be careful when your SAS data set contains missing values
Working With Numeric Variables • SAS uses the standard arithmetic operators +, -, *, /, ** (exponentiation) Note on Missing Values: Arithmetic operators propagate missing values. • SAS has many built-in numeric functions round(variable,value): Rounds variable to nearest unit given by value. sum(variable1, variable2, …): Adds any number of variables and ignores missing values
Acting on Selected Observations • Working with selected observations - subsets of a SAS data set - is easy in SAS • First, you must decide on a selection process. What is the distinguishing characteristic of the observations you want to work with?
Selecting Observations: IF-THEN Statements • The IF-THEN statement is the most common way to select observations. Format: IFconditionTHENaction; • condition is one or more comparisons. For any observation, condition is either true or false. If condition is true, SAS performs the action.
IF-THEN Statement: Example • Suppose INC is a variable representing annual household income and you want to create a dummy variable, DUM, based on income that takes value 1 when income is less than $10,000. IF INC<10000 THEN DUM=1; IF INC >=10000 THEN DUM=0;
Using OBS in condition • In a SAS data set, each record has an observation number which is the number stored in the variable OBS • OBS can be used in a condition, but you must refer to the observation number using the variable _n_ • Example: set the first 10 observations of INC equal to zero IF _n_ <= 10 THEN INC=0;
Comparison Operators • There are 6 comparison operators • Can use either the symbol or mnemonic Symbol Mnemonic Meaning = EQ Equal to ^= NE Not equal to > GT Greater than < LT Less than >= GE Greater than or equal to <= LE Less than or equal to
Multiple Comparisons • Can make more than one comparison in condition by using AND/OR • AND / &: All parts must be true for condition to be true • Or / |: At least one part must be true for condition to be true • Be careful when using AND/OR • Can use parentheses in condition
Selecting Observations for New SAS Data Sets • Can use IF-THEN statements to create new SAS data sets • Either delete or keep selected observations based on condition
Deleting Observations • Format for IF-THEN: IFconditionTHEN DELETE; • Example: Removing missing observations. Suppose the variable INC is missing for some households and you want to drop these observations IFINC=.THEN DELETE;
Keeping Selected Observations • A more straightforward way to create new SAS data sets is to keep only those observations that meet some condition. Format: IFcondition;
Example • The file salary.dat contains data for 93 employees of a Chicago bank. The file contains the following variables: Y: Salary X: Years of education E: Months of previous work experience T: Number of months after 1/1/69 that the individual was hired • First 61 observations are females, last 32 males
Example: Create Dummy for Males *Program to create dummy variables and; *new SAS data sets ; data salary; infile ‘s:\mysas\salary.dat; input y x e t; IF _n_ >61 THEN G=1; IF _n_ <= 60 THEN G=0; run;
Example: Create Data Set for Males *Make a new SAS data set composed of only; *records for males ; data males; *New SAS data set; set=salary; *Created from salary; IF G=1; run;
Example: Create Data Set for Females *Make a new SAS data set composed of only; *records for females ; data females; *New SAS data set; set=salary; *Created from salary; IF G=0; run;
Describing Data: Sample Statistics • Format: PROC UNIVARIATE <option-list>; VAR variable-list; BY variable-list; FREQ variable; WEIGHT variable;
Selected Options DATA=SAS-data-set; Specify Data Set If omitted, uses most recent SAS data set FREQGenerate Frequency Table NOPRINTSuppress Printed Output
VAR Statement • List of variables to calculate sample statistics for. • If no variables are specified, sample statistics are generated for all numeric variables
WEIGHT Statement • Specifies a numeric variable in the SAS data set whose values are used to weight each observation
BY Statement • Can be used to obtain separate analyses on observations in groups defined by some value of a variable. • Example: Suppose SEX=1 if individual is male, SEX=0 if individual is female; EARN=annual earnings. PROC UNIVARIATE; *Generates statistics; VAR EARN; *on earnings for men; BY SEX; *and women; RUN;
BY Statements and Sorting • Before using a BY statement, the SAS data set must be sorted on the variable specified • SAS puts the observations in order, based on the values of the variables specified in the BY statement. • Use PROC SORT
PROC SORT • FORMAT: PROC SORT <options>; BY <options>variables; • Sort Order: ascending. For descending, put DESCENDING on BY line
Describing Data: Frequencies • FORMAT: PROC FREQ <options>; BY variables; TABLES requests</options>; WEIGHT variable;
One-Way Frequency Table • SEX=1 (Male) SEX=0(Female) • EDUCATION=1(Less than High School), =2(High School),=3(Some College),=4(College grad.) • EARN=Annual Earnings PROC FREQ; TABLES EDUCATION; RUN; PROC FREQ; TABLES EDUCATION; BY SEX; RUN;