310 likes | 445 Views
Getting Started Using SAS Software . Animal Science 500 Lecture No. 2. SAS History. Developed in the late 1960s and 1970s at North Carolina State University Original use was for management and analysis of agricultural field experiments Headquarters still in Cary, NC
E N D
Getting Started Using SAS Software Animal Science 500 Lecture No. 2
SAS History • Developed in the late 1960s and 1970s at North Carolina State University • Original use was for management and analysis of agricultural field experiments • Headquarters still in Cary, NC • Claim to be the most widely used statistical software • SAS used to be an acronym for “Statistical Analysis System • No real meaning today – just “SAS”
SAS Products • Base SAS – data management and basic procedures • SAS/STAT – statistical analysis • SAS/GRAPH – presentation quality graphics • SAS/OR – operations research • SAS/ETS – econometrics and time series analysis • SAS/IML – interactive matrix language • SAS/AF – applications facility (menus and interfaces) • SAS/QC – quality control • SAS/Genetics – use for analyzing genetic marker data • Other products for spreadsheets, databases, and connectivity between different machine interfaces that are running SAS
Resources for SAS • Numerous books • Published by SAS and frequently authored users themselves • Published independent of SAS • All can be purchased from a variety of on-line book stores • Amazon • ABE.com • Etc. • SAS documentation is available on-line
Resources for SAS • SAS documentation is available on-line • Majority of book or “hard” documentation can by found on-line • Extensive help section for SAS • Can e-mail SAS software consultants with technical questions that appear to not have solutions available.
On-line support • http://support.sas.com/onlinedoc/913/docMainpage.jsp
Format of SAS program • SAS is very user friendly • No rules about format of your program and several programming rules are allowed; • Statements or program can be written in UPPERCASE or lowercase or a COMbinATion • Statements can continue across multiple lines • Two statements can be on the same line • You can start the program or statements in any column
Making your SAS program User Friendly • Keep your program organized • Often it is easier to review your program by writing only one statement per line. • Start DATA and PROC statements in leading columns • Indent other code that follows the DATA or PROC step • To keep your code understandable, include comments • Future use • Someone else has to use or manipulate your data or program
Making your SAS program User Friendly • To keep your code understandable, include comments • Future use • Someone else has to use or manipulate your data or program • To insert comments use *ahead of comment and SAS will not read this line. • Often start comment lines with multiple **** so comments are easily identifiable. • Can also “comment out” lines of sascode • Surround the sas code you do not want to use but want to save using /* at the beginning and */ at the end of the code you wish to not utilize.
Making your SAS program User Friendly • To keep your code understandable, include comments • SAS can ignore the comments by surrounding it • Two ways to insert the comments • Start comment with * and end with a ; • Start with /* and end with */ • Use care when using this method if using program with another program or in another • Commonly called commenting or “commenting out” lines of code
Making your SAS program User Friendly • Particularly important when: • Writing materials and methods for a variety of publications • In the review process of journal articles • Keep meticulous records on your analyses
Programming Tips • Programming in SAS is a step by step process • Write a portion of the program • Begin with writing the program in small steps • The infile statement • Obtain means • Examine the distribution • Etc. • Sort out any errors – if any and correct them and move the next step of the process • Just because your program “works” or does not give you an error message does not mean everything is correct • Check results or output at this point • The step by step process helps to find and correct errors more easily than attempting to write the entire program and begin the analysis at that point.
Getting Started with A SAS Analysis • Your data set must be imported • Once imported your data set is a “SAS data set” • SAS can read almost any type of data • Once your data is read, SAS will keep track of what is where and in what form it is in • User only has to provide the name and location of the data set you want to us • SAS can figure our what is in it
Data • Data you want analyzed is in a table format • Columns represent the variables of what was measured • The rows represent the observations • Data types • SAS simplifies data into 2 types • Numeric • Character
Data Data types Numeric – are numbers Can have mathematic processes carried out (added, subtracted, multiplied, divided, etc.) Can have any number of decimal places Can be positive or negative (the data sign + or – can be included in the numeric data variable column. The E can be used for scientific notation)
Data Data types Character Anything that is not a numeric data is character Data can contain numbers, letters, special characters ($,#, !, etc. If data has letters or special characters data must be character If data contains only numbers then data can be either numeric or character Sometimes data that is only numbers still may be better as a character variable Example – zip codes
Missing data • Missing data • Character missing data is depicted in a data set as a blank • Numeric missing data is noted in a data set by a .
Data characteristics • Data set size • Prior to SAS 9.1 data sets could contain 32,767 variables (columns) • SAS 9.1 and later, the number of variables not limited by software • Hardware my limit size of data set capable of being evaluated • The number of observations (rows) in all versions of SAS is not limited • Limited only by computer hardware limitations • Spreadsheet used to record data observations may be a limiting factor
SAS Naming rules • Variable names limited to 32 characters or less • Limited to the use of letters, the underscore _ key and digits and can be in either upper or lower case • SAS does remember the case of the first occurrence of each variable name and uses that case when printing results. • Before SAS version 7 the limit was 8 • Advice to use as few as possible so long as you can clearly identify the variable at some later point • Long names have to be included in any Procedure (PROC) where the variable must be identified or used in some modeling process
SAS Naming Rules • Variable names may not contain embedded blanks. • V1 and V_1 are acceptable; V 1 is not. • Graduation Date is acceptable Graduation Date is not. • Certain names are reserved for use by SAS – • _N_ • _TYPE_ • _NAME_ • Logical operators such as ge, lt, and, and eq should not be used as variable names.
SAS Program Structure • Program might also be referred to as code by some • Two components to SAS programs • Data step • Procedure step (often called the Proc step) • Every SAS statement MUST end with a semicolon ( ; ).
SAS Program Structure • The Data step • Reads data from external source – can be read into your program a variety of ways • Manipulate your data – often making new calculations based on the original data • Combines data with other data – combining data sets to do even more calculations or manipulations for a desired outcome • Print reports based on the data that is inputted – can be used for a variety of purposes • These tools can be used to prepare the data for use by one of the procedures available in SAS
SAS Program Structure • The Procedure step • Performs analyses on your data • Proc Sort • Proc Means • Proc Merge • Proc Anova • Proc Mixed • Can produce volumes of output • Often the most effective way to learn SAS (using both the data step and the procedure step) is by doing
SAS Program Structure • Statements used are exclusive to either the DATA or the PROC steps • Remember • Data steps read and modify the data • Proc steps analyze, perform a utility or print the data • Data steps must begin with the words Data • This step can also include: • DO loops • IF – THEN /ELSE logic • SELECT – WHEN / ELSE • Large assortment of other numeric and / or character function
SAS Program Structure • Data steps must begin with the words Data • Can combine data in a variety of ways • Match and merge – i.e. merging by id for example • Concatenate – merging two variables end to end
SAS Program Structure • Procedure statements must begin with PROC followed by a name of a procedure • Print • Sort • Means • A step ends when SAS finds: • A new step (the next DATA or PROC statement) • A run statement • The end of the program (if running in batch mode)
SAS Program Structure • The run statement tells SAS to run all of the preceding lines of a step • Can also indicate to run the lines that are highlighted • Is also the rare “global” statements that are not part of a DATA or PROC step • I typically place a “quit” statement (quit;) after each run statement • Prevents your program from remaining in a do loop or continue running when the program has an error and continues to run potentially infinitely.
Arithmetic Operators • Arithmetic operators indicate that an arithmetic calculation is performed, as shown in the following table: