1 / 26

SAS Programming: Working With Variables

SAS Programming: Working With Variables. Data Step Manipulations. New variables should be created during a Data step Existing variables should be manipulated during a data step. Missing Values in SAS. SAS uses a period (.) to represent missing values in a SAS data set

jalia
Download Presentation

SAS Programming: Working With Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS Programming: Working With Variables

  2. Data Step Manipulations • New variables should be created during a Data step • Existing variables should be manipulated during a data step

  3. Missing Values in SAS • SAS uses a period (.) to represent missing values in a SAS data set • Different SAS procedures and functions treat missing values differently - always be careful when your SAS data set contains missing values

  4. Working With Numeric Variables • SAS uses the standard arithmetic operators +, -, *, /, ** (exponentiation) Note on Missing Values: Arithmetic operators propagate missing values. • SAS has many built-in numeric functions round(variable,value): Rounds variable to nearest unit given by value. sum(variable1, variable2, …): Adds any number of variables and ignores missing values

  5. Acting on Selected Observations • Working with selected observations - subsets of a SAS data set - is easy in SAS • First, you must decide on a selection process. What is the distinguishing characteristic of the observations you want to work with?

  6. Selecting Observations: IF-THEN Statements • The IF-THEN statement is the most common way to select observations. Format: IFconditionTHENaction; • condition is one or more comparisons. For any observation, condition is either true or false. If condition is true, SAS performs the action.

  7. IF-THEN Statement: Example • Suppose INC is a variable representing annual household income and you want to create a dummy variable, DUM, based on income that takes value 1 when income is less than $10,000. IF INC<10000 THEN DUM=1; IF INC >=10000 THEN DUM=0;

  8. Using OBS in condition • In a SAS data set, each record has an observation number which is the number stored in the variable OBS • OBS can be used in a condition, but you must refer to the observation number using the variable _n_ • Example: set the first 10 observations of INC equal to zero IF _n_ <= 10 THEN INC=0;

  9. Comparison Operators • There are 6 comparison operators • Can use either the symbol or mnemonic Symbol Mnemonic Meaning = EQ Equal to ^= NE Not equal to > GT Greater than < LT Less than >= GE Greater than or equal to <= LE Less than or equal to

  10. Multiple Comparisons • Can make more than one comparison in condition by using AND/OR • AND / &: All parts must be true for condition to be true • Or / |: At least one part must be true for condition to be true • Be careful when using AND/OR • Can use parentheses in condition

  11. Selecting Observations for New SAS Data Sets • Can use IF-THEN statements to create new SAS data sets • Either delete or keep selected observations based on condition

  12. Deleting Observations • Format for IF-THEN: IFconditionTHEN DELETE; • Example: Removing missing observations. Suppose the variable INC is missing for some households and you want to drop these observations IFINC=.THEN DELETE;

  13. Keeping Selected Observations • A more straightforward way to create new SAS data sets is to keep only those observations that meet some condition. Format: IFcondition;

  14. Example • The file salary.dat contains data for 93 employees of a Chicago bank. The file contains the following variables: Y: Salary X: Years of education E: Months of previous work experience T: Number of months after 1/1/69 that the individual was hired • First 61 observations are females, last 32 males

  15. Example: Create Dummy for Males *Program to create dummy variables and; *new SAS data sets ; data salary; infile ‘s:\mysas\salary.dat; input y x e t; IF _n_ >61 THEN G=1; IF _n_ <= 60 THEN G=0; run;

  16. Example: Create Data Set for Males *Make a new SAS data set composed of only; *records for males ; data males; *New SAS data set; set=salary; *Created from salary; IF G=1; run;

  17. Example: Create Data Set for Females *Make a new SAS data set composed of only; *records for females ; data females; *New SAS data set; set=salary; *Created from salary; IF G=0; run;

  18. Describing Data: Sample Statistics • Format: PROC UNIVARIATE <option-list>; VAR variable-list; BY variable-list; FREQ variable; WEIGHT variable;

  19. Selected Options DATA=SAS-data-set; Specify Data Set If omitted, uses most recent SAS data set FREQGenerate Frequency Table NOPRINTSuppress Printed Output

  20. VAR Statement • List of variables to calculate sample statistics for. • If no variables are specified, sample statistics are generated for all numeric variables

  21. WEIGHT Statement • Specifies a numeric variable in the SAS data set whose values are used to weight each observation

  22. BY Statement • Can be used to obtain separate analyses on observations in groups defined by some value of a variable. • Example: Suppose SEX=1 if individual is male, SEX=0 if individual is female; EARN=annual earnings. PROC UNIVARIATE; *Generates statistics; VAR EARN; *on earnings for men; BY SEX; *and women; RUN;

  23. BY Statements and Sorting • Before using a BY statement, the SAS data set must be sorted on the variable specified • SAS puts the observations in order, based on the values of the variables specified in the BY statement. • Use PROC SORT

  24. PROC SORT • FORMAT: PROC SORT <options>; BY <options>variables; • Sort Order: ascending. For descending, put DESCENDING on BY line

  25. Describing Data: Frequencies • FORMAT: PROC FREQ <options>; BY variables; TABLES requests</options>; WEIGHT variable;

  26. One-Way Frequency Table • SEX=1 (Male) SEX=0(Female) • EDUCATION=1(Less than High School), =2(High School),=3(Some College),=4(College grad.) • EARN=Annual Earnings PROC FREQ; TABLES EDUCATION; RUN; PROC FREQ; TABLES EDUCATION; BY SEX; RUN;

More Related