340 likes | 600 Views
Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) . Animal Science 500 Lecture No. 3 September 7, 2010. Arithmetic Operators. Comparison Operators.
E N D
Getting Your Data Into SAS(Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September 7, 2010
Comparison Operators • Comparison operators set up a comparison, operation, or calculation with two variables, constants, or expressions within the dataset being used . • If the comparison is true, the result is 1. • If the comparison is false, the result is 0. • Comparison operators can be expressed as symbols or with their mnemonic equivalents, which are shown in the following table:
Logical (Boolean) Operators and Expressions Logical operators, also called Boolean operators, are usually used in expressions to link sequences of comparisons.
Finding your data • Most of the time your “raw” data files will be saved as external files • Text files – Word, WordPerfect, Writer, etc. • Spreadsheets - Excel, Lotus, Quattro Pro, etc. • Other systems – Unix, Open VMS, etc.
Reading external files into SAS • The files containing your stored data will typically be stored • On the hard drive of the computer that you will ultimately use to analyze the data with SAS • Stored externally – • USB memory stick (flash memory) • External hard drive Must get your data from “storage” into SAS to conduct the analyses
Reading external files into SAS • Use the Infile statement within a DATA step • Data mytrial; Infile ‘c:\mydocument\trial.xls’; Input statement (Input variable names Remember to put the $ for character variables. You may have to tell SAS which columns individual variables are found and place the decimal
Reading external files into SAS • Data mytrial; Infile ‘c:\mydocument\trial.xls’ DLM=“,” ; Many options to assist you when using the infile command. DLM=used to specify the delimiter that separates the variables in your raw data file. For example, dlm=','indicates a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm='09'x indicates that tabs are used to separate your variables (e.g., a tab separated file).
Reading external files into SAS • Other options • DSD The dsd option has 2 functions. • First, it recognizes two consecutive delimiters as a missing value. • For example, if your file contained the line 20,30,,50 SAS will treat this as 20 30 50 but with the thedsd option SAS will treat it as 20 30 . 50 , which is probably what you intended.
Reading external files into SAS • Other options • DSD option allows you to include the delimiter within quoted strings. For example, you would want to use the dsd option if you had a comma separated file and your data included values like "George Bush, Jr.". With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.
Reading external files into SAS • Other options • FIRSTOBS=Tells SAS what on what line you want it to start reading your raw data file. (Default = 1) If the first record(s) contains header information such as variable names, then set firstobs=n where n is the record number where the data actually begin. Example: Assume you are reading a comma separated file or a tab separated file where the variable names are on the first line. Use firstobs=2 to tell SAS to begin reading at the second line. (Ignores the first line with the names of the variables).
Reading external files into SAS • Other options • MISSOVER This option prevents SAS from going to a new input line if it does not find values for all of the variables in the current line of data. For example, you may be reading a space delimited file and that is supposed to have 10 values per line, but one of the line had only 9 values. Without the missover option, SAS will look for the 10th value on the next line of data. Sets all empty variables to missing when reading a short line.
Reading external files into SAS • Other options • MISSOVER If your data is supposed to only have one observation for each line of raw data, then this could cause errors throughout the rest of your data file. If you have a raw data file that has one record per line, this option is a prudent method of trying to keep such errors from cascading through the rest of your data file.
Reading external files into SAS • Other options • OBS= Indicates which line in your raw data file should be treated as the last record to be read by SAS. This is a good option to use for testing your program. For example, you might use obs=100 to just read in the first 100 lines of data while you are testing your program.
Reading external files into SAS • Other options • A typical infile statement for reading a comma delimited file that contains the variable names in the first line of data would be: • INFILE "test.txt" DLM=',' DSD MISSOVER FIRSTOBS=2 ;
Reading external files into SAS • Other options • LRECL = logical record length LRECL is really useful for Windows users. Default, Windows creates files with a logical record length of 256. May appear that SAS is not reading all of your data or that beyond some point all variables are not being read.
Reading external files into SAS • Other options • LRECL = logical record length LRECL is really useful for Windows users. You can tell Windows exactly how long to make the record length on the filename statement in SAS. The option is lrecl= (logical record length) and it looks like this:filename myFile "c:\some directory\some file.txt" LRECL= 400; • This option is REQUIRED if length of data line is over 256.
Knowing what Options are Available • Obviously can look up using: • SAS on-line help • SAS manuals and books • Other example programs Can also determine what options are available using the PROC Options: Run; Quit; Will output what options are available to you at this step of your SAS program or code.
Informats • Host of selected informats on pages 46-47 in the The Little SAS Book, 4th Edition. • Different ways data can be formatted and read in SAS • Dates, Times, and combined DateTime • Reading Julian dates
Titles and Footnotes • SAS allows up to 10 lines of text at the top (titles) and the bottom (footnote) on each page of output using the title and footnote statements. • Title <n> text; • Footnote <n> text; • Where n is the number of lines and have the range of limits for each 1 to 10. • If text is omitted, the title or footnote is deleted • Otherwise it remains in effect until it is redefined.
Titles and Footnotes • SAS allows up to 10 lines of text at the top (titles) and the bottom (footnote) on each page of output using the title and footnote statements. • To have no titles you can include title; • The default in SAS included the date and page number at the top of each output. • To get rid of these options • Type nodate and / or nonumber in the options section.
Temporary versus Permanent SAS Data Sets • Temporary SAS dataset • Only exists during the current job or session • It is erased by SAS when you finish and close down SAS • Permanent SAS dataset • Does not mean it is around for ever or eternity • It remains stored even after you close your SAS session. • If you use a data set more than once, it is more efficient to save it as a permanent SAS data set
Temporary versus Permanent SAS Data Sets • Using the Permanent SAS data set allows you to skip the infile step whether you use the import wizard or use an infile statement. • If you are going to modify your data set it is likely easier to use the temporary SAS data set. • Need to add more data to “final” data set • Have not checked the “final” data set for errors • Maybe other reasons.
Listing the Contents of a SAS Data Set • Proc Contents • Place Proc Contents data=yourdatasetname; • If you leave off the data= then SAS will perform the Proc Contents on the last data set created. • It is a good way to check and see if all of your data are being correctly read into SAS for further analyses.
Listing the Contents of a SAS Data Set • Output from Proc Contents – • Data Set Name – be sure you evaluated the correct data set • Observations – did the correct number of observations get read into the analysis • Variables - were the correct number of variables identified • Created – date the analysis was created • Label – Some label you might have provided
Listing the Contents of a SAS Data Set • Output from Proc Contents – Listing of variables in alphabetical order The following output is created for each variable • Type – numeric or character • Length – storage size (in bytes) • Format for printing if any (for example the date may have been converted to worddate) • Informat for input if any (for example mmddyyyy for a date) • Variable label (e.g. date of birth, height in inches, weight in pounds
Processing an Existing Data Set • When you want to process an existing SAS data set • Use the set statement rather than an infile statement • Each time SAS encounters a set statement, SAS inputs an observation from an existing data set which contains all of the variables
Processing an Existing Data Set Data data1; set data2; Average daily gain = (offweight – onweight) / daysontest; Run; Quit; Again, if the user does not specify a dataset to perform the operations, the last dataset used will be used again.
Arithmetic Operators • Arithmetic operators indicate that an arithmetic calculation is performed, as shown in the following table: