1 / 26

Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book)

Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) . Animal Science 500 Lecture No. 6 September 16, 2010. Modifying Variable Lengths. The code below uses only 3 character spaces for the county by specifying the length of the location variable

jenifer
Download Presentation

Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modifying and Combing SAS Data Sets(Chapter in the 6 Little SAS Book) Animal Science 500 Lecture No. 6 September 16, 2010

  2. Modifying Variable Lengths • The code below uses only 3 character spaces for the county by specifying the length of the location variable • You can also use this with numeric variables data new; length location $3; input location $ 1-15 date $ rainfall; Cards; /*remember you could use the datalines statement to do the same thing);*/ Storycounty 6/10/10 4.5 Polk county 6/10/10 6.5 Story county 7/10/10 4.9 Polk county 7/10/10 2.4 ; Run; Quit;

  3. Creating new variables • You can create variables in the initial data step where you are inputting the data or you can use a new data step to create the variables. • As we did when we created • Adjusted backfat variables for barrows and gilts • Total weight gain during the test period • Average daily gain during the test period • You do not have to make calculations when making new variables. • You could divide backfat, loin muscle are, gain, etc into categories to evaluate if model effects differ by the classes you developed

  4. Creating new variables • If you choose to create a new data step – you will need to either create a new file in SAS or modify the existing file. • Example what we have done in Lab by the following statements Data Pig13 Set Pig12; • Alternatively you could create a new file in SAS • If you create a new file you may need to merge it with the original data file; • This is a place where students often have difficulty

  5. Creating New Variables in SAS data new; input @1 location $ 1-15 date mmddyy8. rainfall; log_rain = log(rainfall); sqrt_rain= sqrt(rain); datalines; Story county 6/10/10 4.5 Polk county 6/10/10 6.5 Story county7/10/10 4.9 Polk county 7/10/10 2.4 ; run; data new; set new; log_rain = log(rainfall); sqrt_rain = sqrt(rain); run; • Both of these options will result in a data set named “New” with all the variables that have been defined • This option creates the variables lograinfall and sqrtrainfall in the initial data step • In the second set of code you are creating a new file in SAS and naming it “New” the set statement tells SAS to Assign the data from the first “New” to this file “New”

  6. Creating New Variables in SAS • SAS has many other functions to perform various calculations for trigonometry, finance, and other applications. • Some examples assuming x is the variable you want to modify: • Log = log(x) • Sin = sin(x) • Cos = cos(x) • As we will see in the next lab, you may find the distribution does not meet the assumptions of the analysis of variance • Will need to transform data often using some of the • Use SAS help to find the correct notation • Some helpful search hints are: • Search under SAS Functions, Arithmetic Functions, Numeric Variables, Logical Operators

  7. Operators • Recall from previous discussions • Addition, subtraction, multiplication, and division are specified by +, -, *, and /, respectively. • For exponentiation, a double asterisk ** is used. • exprainfall = exprainfall**2 • Parentheses can be used to group expressions, and these expressions can be nested within several levels. SAS follows the standard PEMDAS order, () ** * / + - , for evaluating functions.

  8. Logical Operators data new; input @1 location $ 1-15 date mmddyy8. rainfall; *creating a new variable based on location; if location = “Story county" then x = rainfall +5; else x = 5; *creating a new variable based on level of rainfall; if rainfall < 3 then y=1; else if rainfall < 4.9 then y=2; else y = 3; datalines; Story county6/10/10 4.5 Polk county 6/10/10 6.5 Story county 7/10/10 4.9 Polk county 7/10/10 2.4 ; Run; Quit; data new; set new; log_rain = log(rainfall); sqrt_rain = sqrt(rain); run; • SAS can also evaluate logical expressions • < ,>, =, if, then, else, else if, and (&), or(|), not (^)… • Search Logical Operators in SAS Help

  9. Other ways to Modify Variables - Do Loops • DO loops can be used to create an ordered sequence of numbers. • Below is an example of a do loop in SAS • The program "loops" through the values of Q from 1 to 5 and performs the calculations requested for the current value of Q. The OUTPUT statement tells SAS to export Q and the new variables to the dataset EXAMPLE. The END statement signifies the end of the loop. An END statement is necessary for each DO statement! Notice that neither INPUT nor DATALINES statements are used. data do; set new; do q=1 to 5; q_rain=q*rainfall; q_rainsquared=q_rain**2; output; end; procprint data = do; run; Quit;

  10. Modifying your data using PROC Transpose • Sometimes you need to reshape your data which is in a long format (shown below)

  11. Modifying your data using PROC Transpose • into a wide format (shown below).

  12. Modifying your data using PROC Transpose • How do we accomplish this? • SAS proc transpose to reshape the data from a long to a wide format.

  13. Modifying your data using PROC Transpose data long1 ; input famid year faminc ; cards ; 1 96 40000 1 97 40500 1 98 41000 2 96 45000 2 97 45400 2 98 45800 3 96 75000 3 97 76000 3 98 77000 ; run; quit; proc transpose data=long1 out=wide1 prefix=faminc; by famid ; id year; varfamin; run; quit; proc print data = wide1; run; quit; Notice that the option prefix=famincspecifies a prefix to use in constructing names for transposed variables in the output data set. SAS automatic variable _NAME_ contains the name of the variable being transposed

  14. Modifying your data using PROC Transpose • What does this get you? • SAS output that looks like the following Obsfamid _Name_ faminc96 faminc97 faminc98 1 1 faminc40000 40500 41000 2 2 faminc45000 45400 45800 3 3 faminc75000 76000 77000

  15. Removing Observations • When performing statistical calculations, SAS, by default, uses all of the observations that are in the dataset. • You can selectively delete observations that you do not wish to use with: • IF statements - specify which observations to keep, • IF and THEN DELETE will delete observations.

  16. Removing Observations • Both statements result in a file with only data from Story County data new; set new; if location = “Story county"; Run; Quit; data new; set new; if location = “Polk county" then delete; run;

  17. Removing Variables • You may only need to use a few variables from a larger dataset. This can be done with KEEP or DROP statements. • For many datasets, you can keep unneeded variables in the dataset, and SAS can handle them with ease. • This will be the case for most of you dealing with several hundred or even several thousand observations • Those dealing with very large data sets might benefit from using the Keep or Drop statements • Caution be sure you are thinking ahead so that you do not drop a variable needed later in some calculation or function

  18. Removing Variables • Let’s say we wanted to keep only the location, x and y variables from the “new” file. • Both data steps below will accomplish the task. data subset; set new; drop date rainfall; run; data subset; set new; keep location x y; run;

  19. Combining Datasets • There are two different ways to combine datasets using the SET and MERGE statements in a Data Step. • The SET statement is used to add observations to an existing dataset. • This is what we have done in lab • We have developed new variables using calculations with existing data and have adjusted current data in the data set. • Data Pig12 Set Pig12; • ADG = (OFFWT – ONWT) / DOT; • Consider the following example, using monthly rainfall totals in Gainesville in 1995 and 1996. When used to combine observations into one dataset, the SET command works as follows:

  20. Combining Datasets – Set data rain95; input month rainfall @@; year=1995; datalines; 1 3.08 2 1.07 3 6.14 4 5.18 5 2.47 6 7.55 7 7.66 8 7.20 9 2.10 10 4.33 11 3.15 12 1.29 ; run; quit; data rain96; input month rainfall @@; year=1996; datalines; 1 0.97 2 0.66 3 10.52 4 1.72 5 2.01 6 6.05 7 11.00 8 4.90 9 2.23 10 6.18 11 1.73 12 6.63 ; run; quit; data rain9596; set rain95 rain96; run; quit;

  21. Combining Datasets • The MERGE statement adds the variables in one dataset to another dataset. • Consider the following example using Southern teams in the National Basketball Association:

  22. Combining Datasets - Merge data nba1; input @1 city $11. @11 division $; Cards; Orlando Atlantic Miami Atlantic Atlanta Central Charlotte Central ; run; quit; data nba2; input mascot $ @@; Cards; Magic Heat Hawks Hornets ; run; quit; data nba3; merge nba1 nba2; run; quit; Proc Print; run; quit;

  23. Combining Datasets – Merge Output Results Obscity division mascot 1 Orlando Atlantic Magic 2 Miami Atlantic Heat 3 Atlanta Central Hawks 4 Charlotte Central Hornets

  24. Combining Datasets • In the last example, the observations were ordered so that each dataset corresponds to the other. You will often need to put datasets together based on values of variables which are included in both datasets. To do this, both datasets must first be sorted in order by the common variable or variables.

  25. Combining Datasets Proc sort sorts the data file using the variable chosen in the by statement – we will discuss Proc Sort in more detail later on

  26. Combining Datasets with Merge -Output The SAS System 10:16 Saturday, September 11, 2010 14 Obscity division mascot 1 Atlanta Central Hawks 2 Charlotte Central Hornets 3 Miami Atlantic Heat 4 Orlando Atlantic Magic

More Related