370 likes | 659 Views
Working with Your Data (Chapter 2 in the Little SAS Book) . Animal Science 500 Lecture No. 4 September 9, 2010. Working with Your Data. To this point we have identified Many forms which data can stored and ultimately imported into SAS Spreadsheets – Excel, Lotus, Quattro Pro, etc.
E N D
Working with Your Data(Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 4 September 9, 2010
Working with Your Data To this point we have identified • Many forms which data can stored and ultimately imported into SAS • Spreadsheets – Excel, Lotus, Quattro Pro, etc. • Databases – Access, SQL, others • Text files – from Word, WordPad, Notepad, others • Other fileformats • Many ways to import our data into SAS • Import wizard • Infile statement • Others Many options to use with the importing of the data, formatting the input data, etc.
Modifying your Data • Data step • read and modify data • create a new dataset • performs actions on rows • Proc step • use an existing dataset • produce an output/results • performs actions on columns
Modifying your Data • Creating and redefining variables is straightforward in a SAS data step • variable = expression; • Examples • Newvariable = constant; • Newvariable = oldvariable* constant; • Adjusted Backfat, growth rate, loin muscle area = predetermined equation
Comparison Operators • Comparison operators set up a comparison, operation, or calculation with two variables, constants, or expressions within the dataset being used . • If the comparison is true, the result is 1. • If the comparison is false, the result is 0. • Comparison operators can be expressed as symbols or with their mnemonic equivalents, which are shown in the following table:
Logical (Boolean) Operators and Expressions Logical operators, also called Boolean operators, are usually used in expressions to link sequences of comparisons.
Order of calculations • The order in which any of the functions follow standard mathematical rules of precedence. • To overcome this parentheses are used to override that order.
Modifying your Data • The DROP or KEEP statements • Used to decrease the number of variables • Usually not a concern with datasets normally encountered • Remember that the variables are dropped or retained (keep) within the SAS dataset unless you specify otherwise
Modifying your Data Data new2; set new; ADG = ((Finalwt. – Beginningwt) / DaysOnTest); Drop BeginningwtDaysOnTest; Proc Means; Run; Quit;
Using IF – THEN Statements • In cases where we want to assign some statement to some of your observations but not all. • For example adjustment factors for backfat, loin muscle area, growth rate that differing by sex of animal • Called condition – action statements • IF condition THEN action;
Using IF – THEN Statements • Example1 • If job='banker' then highsal=1; • IF condition AND condition THEN action; • Example2 • If job='banker' and age>65 then ret_banker=1; • If job eq ‘banker’ and age ge 65 tehnret_banker=1;
Using IF – THEN Statements • Example1 • If job='banker' then highsal=1; • IF condition AND condition THEN action; • Example2 • If job='banker' and age>65 then ret_banker=1; • If job eq ‘banker’ and age ge 65 tehnret_banker=1;
Using the IN Operator • Using the IN operator makes comparisons and works similarly to the If – Then statement but gives a bit more flexibility • IF Model IN (‘Corvette’, ‘Camaro’) Then make = ‘Cheverolet’; • Assumes you have a column or variable titled Model • Creates new variable or column titled Cheverolet
Using the IN Operator • Example using animal data. • IF SEX IN (‘gilt’, ‘barrow’) Then adjustedBF = BF + ((actualwt – 250) * (actualbf / (actualwt – constant1))); • IF SEX IN (‘boar”) Then adjustedBF = BF + ((actualwt – 250) * (actualbf / (actualwt– constant2)));
Using IF – THEN Statements • A single IF – THEN statement can have only one action • Using the key words DO and END then it is possible to execute more than 1 action • Example IF condition THEN DO; action; action; END;
Using IF – THEN Statements • A single IF – THEN statement can have only one action • Using the key words DO and END then it is possible to execute more than 1 action • Example IF Model = ‘Mustang” THEN DO; Make = ‘Ford’; Size = ‘Compact’; END;
Using IF – THEN Statements • The AND and OR keywords can be used to specify multiple conditions • IF condition AND condition THEN action; • Example • IF Model = ‘Mustang’ AND Year < 1975 THEN Status = ‘Classic’; both conditions must be met to reach the ‘Classic” status • IF Model = ‘Mustang’ OR Year < 1975 THEN Status = ‘Classic’; only one of the conditions must be met to reach the ‘Classic” status
Using IF – THEN /ELSE Statements • Using the IF - Then / Else statement is typically used to group observations • Basic form of statement • IF condition THEN action; ELSE IF condition THEN action; ELSE IF condition THEN action; Advantages: When compared to regular IF – THEN statements • Computationally more efficient as it uses less computer time because once a condition is satisfied SAS skips the rest of the steps. • The ELSE statement is mutually exclusive thus preventing an observation from ending up in more than one group.
The DO – END statement The DO-END statement is useful if you want to make several changes or create new variables for a subgroup or under certain conditions. Note that the DO-loop continues until you end it using END; Example If sex = ’female’ then do; • AdjustedBF = equation; • AdjustedLMA = equation; • AdjustedDAYS = equation; • End;
The DO – END statement The DO-END statement is useful if you want to make several changes or create new variables for a subgroup or under certain conditions. Note that the DO-loop continues until you end it using END; • Else if sex EQ ’male’ then do; • ... • ... • end;
Using IF – THEN /ELSE Statements • Example: • IF CowBC . THEN AdjustedCowBC = .; ELSE IF CowBCge 9 THEN AdjustedCowBC = 5; ELSE IF CowBCge 7 and lt 9 THEN AdjustedCowBC = 4; ELSE IF CowBCge5 and lt 7 THEN AdjustedCowBC = 3; ELSE IF CowBCge3 and lt 5 THEN AdjustedCowBC = 2; ELSE IF CowBCge1 and lt3 THEN AdjustedCowBC = 1; ELSE IF condition THEN action;
Using IF – THEN /ELSE Statements • When the condition is true, SAS assigns the stated value to AdjustedCowBC and then leaves the loop. The last ELSE is a trash-bin: anything that is not covered by the previous conditions is put to missing. • Look at what gets assigned to the last ELSE statement as it may identify an error in the data set or other problems • Make sure the condition part includes all possibilities, else you might get missingsor hidden errors. • Examine the log where it will reveal the number of missing values created, but there is no indication for observations that were not covered by your programming!
Subsetting your data • Sometimes researchers or programmers want to look at only a portion of the data that is collected. • Can accomplish this using the IF statement • Example only interested in gilts in a dataset that includes data from boars and barrows. • IF sex = ‘barrow’ THEN delete; • IF sex = ‘boar” THEN delete;
Subsetting your data • Sometimes researchers or programmers want to look at only a portion of the data that is collected. • Another way to use the IF statement • Rather than making it an deletion statement, make an inclusionary statement; • Example only interested in gilts in a dataset that includes data from boars and barrows. • IF sex = ‘gilt’; this results in the program looking at data where sex = gilt
Subsetting your data What is the difference when looking at only gilts if the statements IF sex = ‘gilt’; this results in the program looking at data where sex = gilt; IF sex = ‘barrow’ THEN delete; IF sex = ‘boar” THEN delete; Dataset would include anything coded incorrectly in the dataset other than ‘barrow’ or ‘boar’ Using the inclusionary statement (IF sex =‘gilt”), requires that every line of data be examined
The RETAIN Statements • Use the RETAIN statement when you want to keep some or all of the variable from a previous DATA step. • The RETAIN variable list; can appear anywhere in the DATA step • You can specify an initial value instead of missing for variables as follows RETAIN variable list initial value;
The Sum Statements • The Sum statement is used when have a cumulative total for some variable. Example used in the book; RETAIN MaxRuns; MaxRuns = MAX (MaxRuns, Runs); RunsToDate + Runs; Might want to use something similar if you are totaling milk production across the days in lactation or pigs born alive across parity.
Using Arrays in SAS • An array is a temporary holding site for a collection of variables upon which the same operations will be performed. • Arrays provide convenient shortcuts in programming. • An array is a group of variables that is user defined. • The array is user defined in the DATA step. • All the variables in an array must either be characters or numeric CANNOT mix character and numeric variables in the same array.
Using Arrays in SAS • Array name (n) $ (may or may not be included) variable list. • The number (n) must match the number of variables in the list. • An array by itself does nothing • You create the array to perform some function that you want to perform on all array variables (book uses changing missing value from 9 to .
Using Shortcuts for Lists of Variable Names • Provides you a listing for inputting variables that have very similar names • Variable1, Variable2, Variable3, Variable4 and so forth • You could use an input statement that includes all of the names – INPUT Variable1 Variable2 Variable3 Variable4; • Alternatively you could write it • INPUT Variable1 – Variable4; and all variable will have been inputted.