1 / 32

Working with Your Data (Chapter 2 in the Little SAS Book)

Working with Your Data (Chapter 2 in the Little SAS Book) . Animal Science 500 Lecture No. 4 September 9, 2010. Working with Your Data. To this point we have identified Many forms which data can stored and ultimately imported into SAS Spreadsheets – Excel, Lotus, Quattro Pro, etc.

mary
Download Presentation

Working with Your Data (Chapter 2 in the Little SAS Book)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Working with Your Data(Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 4 September 9, 2010

  2. Working with Your Data To this point we have identified • Many forms which data can stored and ultimately imported into SAS • Spreadsheets – Excel, Lotus, Quattro Pro, etc. • Databases – Access, SQL, others • Text files – from Word, WordPad, Notepad, others • Other fileformats • Many ways to import our data into SAS • Import wizard • Infile statement • Others Many options to use with the importing of the data, formatting the input data, etc.

  3. Modifying your Data • Data step • read and modify data • create a new dataset • performs actions on rows • Proc step • use an existing dataset • produce an output/results • performs actions on columns

  4. Modifying your Data • Creating and redefining variables is straightforward in a SAS data step • variable = expression; • Examples • Newvariable = constant; • Newvariable = oldvariable* constant; • Adjusted Backfat, growth rate, loin muscle area = predetermined equation

  5. Arithmetic Operators

  6. Comparison Operators • Comparison operators set up a comparison, operation, or calculation with two variables, constants, or expressions within the dataset being used . • If the comparison is true, the result is 1. • If the comparison is false, the result is 0. • Comparison operators can be expressed as symbols or with their mnemonic equivalents, which are shown in the following table:

  7. Comparison Operators

  8. Logical (Boolean) Operators and Expressions Logical operators, also called Boolean operators, are usually used in expressions to link sequences of comparisons.

  9. Order of calculations • The order in which any of the functions follow standard mathematical rules of precedence. • To overcome this parentheses are used to override that order.

  10. Modifying your Data • The DROP or KEEP statements • Used to decrease the number of variables • Usually not a concern with datasets normally encountered • Remember that the variables are dropped or retained (keep) within the SAS dataset unless you specify otherwise

  11. Modifying your Data Data new2; set new; ADG = ((Finalwt. – Beginningwt) / DaysOnTest); Drop BeginningwtDaysOnTest; Proc Means; Run; Quit;

  12. Using IF – THEN Statements • In cases where we want to assign some statement to some of your observations but not all. • For example adjustment factors for backfat, loin muscle area, growth rate that differing by sex of animal • Called condition – action statements • IF condition THEN action;

  13. Using IF – THEN Statements • Example1 • If job='banker' then highsal=1; • IF condition AND condition THEN action; • Example2 • If job='banker' and age>65 then ret_banker=1; • If job eq ‘banker’ and age ge 65 tehnret_banker=1;

  14. Using IF – THEN Statements • Example1 • If job='banker' then highsal=1; • IF condition AND condition THEN action; • Example2 • If job='banker' and age>65 then ret_banker=1; • If job eq ‘banker’ and age ge 65 tehnret_banker=1;

  15. Using the IN Operator • Using the IN operator makes comparisons and works similarly to the If – Then statement but gives a bit more flexibility • IF Model IN (‘Corvette’, ‘Camaro’) Then make = ‘Cheverolet’; • Assumes you have a column or variable titled Model • Creates new variable or column titled Cheverolet

  16. Using the IN Operator • Example using animal data. • IF SEX IN (‘gilt’, ‘barrow’) Then adjustedBF = BF + ((actualwt – 250) * (actualbf / (actualwt – constant1))); • IF SEX IN (‘boar”) Then adjustedBF = BF + ((actualwt – 250) * (actualbf / (actualwt– constant2)));

  17. Using IF – THEN Statements • A single IF – THEN statement can have only one action • Using the key words DO and END then it is possible to execute more than 1 action • Example IF condition THEN DO; action; action; END;

  18. Using IF – THEN Statements • A single IF – THEN statement can have only one action • Using the key words DO and END then it is possible to execute more than 1 action • Example IF Model = ‘Mustang” THEN DO; Make = ‘Ford’; Size = ‘Compact’; END;

  19. Using IF – THEN Statements • The AND and OR keywords can be used to specify multiple conditions • IF condition AND condition THEN action; • Example • IF Model = ‘Mustang’ AND Year < 1975 THEN Status = ‘Classic’; both conditions must be met to reach the ‘Classic” status • IF Model = ‘Mustang’ OR Year < 1975 THEN Status = ‘Classic’; only one of the conditions must be met to reach the ‘Classic” status

  20. Using IF – THEN /ELSE Statements • Using the IF - Then / Else statement is typically used to group observations • Basic form of statement • IF condition THEN action; ELSE IF condition THEN action; ELSE IF condition THEN action; Advantages: When compared to regular IF – THEN statements • Computationally more efficient as it uses less computer time because once a condition is satisfied SAS skips the rest of the steps. • The ELSE statement is mutually exclusive thus preventing an observation from ending up in more than one group.

  21. The DO – END statement The DO-END statement is useful if you want to make several changes or create new variables for a subgroup or under certain conditions. Note that the DO-loop continues until you end it using END; Example If sex = ’female’ then do; • AdjustedBF = equation; • AdjustedLMA = equation; • AdjustedDAYS = equation; • End;

  22. The DO – END statement The DO-END statement is useful if you want to make several changes or create new variables for a subgroup or under certain conditions. Note that the DO-loop continues until you end it using END; • Else if sex EQ ’male’ then do; • ... • ... • end;

  23. Using IF – THEN /ELSE Statements • Example: • IF CowBC . THEN AdjustedCowBC = .; ELSE IF CowBCge 9 THEN AdjustedCowBC = 5; ELSE IF CowBCge 7 and lt 9 THEN AdjustedCowBC = 4; ELSE IF CowBCge5 and lt 7 THEN AdjustedCowBC = 3; ELSE IF CowBCge3 and lt 5 THEN AdjustedCowBC = 2; ELSE IF CowBCge1 and lt3 THEN AdjustedCowBC = 1; ELSE IF condition THEN action;

  24. Using IF – THEN /ELSE Statements • When the condition is true, SAS assigns the stated value to AdjustedCowBC and then leaves the loop. The last ELSE is a trash-bin: anything that is not covered by the previous conditions is put to missing. • Look at what gets assigned to the last ELSE statement as it may identify an error in the data set or other problems • Make sure the condition part includes all possibilities, else you might get missingsor hidden errors. • Examine the log where it will reveal the number of missing values created, but there is no indication for observations that were not covered by your programming!

  25. Subsetting your data • Sometimes researchers or programmers want to look at only a portion of the data that is collected. • Can accomplish this using the IF statement • Example only interested in gilts in a dataset that includes data from boars and barrows. • IF sex = ‘barrow’ THEN delete; • IF sex = ‘boar” THEN delete;

  26. Subsetting your data • Sometimes researchers or programmers want to look at only a portion of the data that is collected. • Another way to use the IF statement • Rather than making it an deletion statement, make an inclusionary statement; • Example only interested in gilts in a dataset that includes data from boars and barrows. • IF sex = ‘gilt’; this results in the program looking at data where sex = gilt

  27. Subsetting your data What is the difference when looking at only gilts if the statements IF sex = ‘gilt’; this results in the program looking at data where sex = gilt; IF sex = ‘barrow’ THEN delete; IF sex = ‘boar” THEN delete; Dataset would include anything coded incorrectly in the dataset other than ‘barrow’ or ‘boar’ Using the inclusionary statement (IF sex =‘gilt”), requires that every line of data be examined

  28. The RETAIN Statements • Use the RETAIN statement when you want to keep some or all of the variable from a previous DATA step. • The RETAIN variable list; can appear anywhere in the DATA step • You can specify an initial value instead of missing for variables as follows RETAIN variable list initial value;

  29. The Sum Statements • The Sum statement is used when have a cumulative total for some variable. Example used in the book; RETAIN MaxRuns; MaxRuns = MAX (MaxRuns, Runs); RunsToDate + Runs; Might want to use something similar if you are totaling milk production across the days in lactation or pigs born alive across parity.

  30. Using Arrays in SAS • An array is a temporary holding site for a collection of variables upon which the same operations will be performed. • Arrays provide convenient shortcuts in programming. • An array is a group of variables that is user defined. • The array is user defined in the DATA step. • All the variables in an array must either be characters or numeric CANNOT mix character and numeric variables in the same array.

  31. Using Arrays in SAS • Array name (n) $ (may or may not be included) variable list. • The number (n) must match the number of variables in the list. • An array by itself does nothing • You create the array to perform some function that you want to perform on all array variables (book uses changing missing value from 9 to .

  32. Using Shortcuts for Lists of Variable Names • Provides you a listing for inputting variables that have very similar names • Variable1, Variable2, Variable3, Variable4 and so forth • You could use an input statement that includes all of the names – INPUT Variable1 Variable2 Variable3 Variable4; • Alternatively you could write it • INPUT Variable1 – Variable4; and all variable will have been inputted.

More Related