470 likes | 522 Views
BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3, 1.5, 1.10, 12.1, 12.2, 12.6, 12.16, 13.3, 13.8 Homework Due – None Homework for Next Week – No Class but turn in HW1!. Lecturer: Annie N. Simpson, MSc.
E N D
BMTRY 789 Lecture 2SAS Syntax, entering raw data, etc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3, 1.5, 1.10, 12.1, 12.2, 12.6, 12.16,13.3, 13.8 Homework Due – None Homework for Next Week – No Class but turn in HW1! Lecturer: Annie N. Simpson, MSc.
Parts of a SAS Program • What are the two main parts of a SAS program? BMTRY 789 Intro. To SAS Programming
Parts of a SAS Program • What is a SAS STATEMENT? BMTRY 789 Intro. To SAS Programming
DATA Step • What takes place in a DATA step? BMTRY 789 Intro. To SAS Programming
DATA Step = Do/Create Things • What takes place in a DATA step? • Input Data (what types?) • Do END loops • IF-THEN-ELSE statements • Subset data: IF expression/ IF expression THEN DELETE • Create and redefine variables • Functions • Interleave, merge, and update BMTRY 789 Intro. To SAS Programming
PROC Step • What takes place in a PROC step? BMTRY 789 Intro. To SAS Programming
PROC Step = Produce Results • What takes place in a PROC step? • Perform specific analysis or function • Sorting • Printing • Univariate Analysis • Analysis of variance • Regression… BMTRY 789 Intro. To SAS Programming
PROC Step • What PROCs have you learned about in your readings so far? BMTRY 789 Intro. To SAS Programming
PROC Step • What PROC would you use to produce Simple Descriptive Statistics? • What about to produce a stem-and-leaf plot, boxplot, histogram, QQPlot, etc? BMTRY 789 Intro. To SAS Programming
PROC Step broken down into subgroups • How do you get the Proc Means output separately for men and women if you have a GENDER variable? • What descriptive stats can you do on the non-numeric data? What Proc would you use? BMTRY 789 Intro. To SAS Programming
PROC Step for Graphics? • What PROCs can you use to produce graphs and charts? BMTRY 789 Intro. To SAS Programming
PROC Step for Graphics? • What is the difference between Proc Plot and GPlot? Proc Chart and Gchart? BMTRY 789 Intro. To SAS Programming
DATA…How do we work with it? • What type of data is this? Data EX1; INPUT Group$ X Y Z; DATALINES; Control 12 17 19 Treat 23 . 29 Control 19 18 16 Treat 22 22 . ; Run; BMTRY 789 Intro. To SAS Programming
SAS INPUT & INFILE Statements • In what 2 situations do you use an INPUT statement? • ________ • ________ • When is the only time that you use an INFILE statement? • What is the INPUT statement really accomplishing? (i.e. why does SAS need it) BMTRY 789 Intro. To SAS Programming
SAS INPUT Statement • Before you can analyze your data with SAS software, your data must be in a form that SAS can read • If you put raw data directly in your SAS program, then your data are internal • You may want to do this when you have small amounts of data, or you are testing a program with a small test data set • INPUT is used to read data from an external source or from internal data contained in your SAS program • The INFILE statement names an external file from which to read the data; otherwise the CARDS (or DATALINES) statement is used to precede the internal data BMTRY 789 Intro. To SAS Programming
External raw data files • Usually you will want to keep your data in external files, separating the data from the program. • Use the INFILE statement to tell SAS the filename and path (directory) of the external file containing the data. The INFILE statement follows the DATA statement and must precede the INPUT statement. After the INFILE keyword, the file path and name are enclosed in single quotes. BMTRY 789 Intro. To SAS Programming
Data one; INFILE ‘c:\MyData\diabetes.dat’; Input a$ b c; Run; *Reading from an external file into a SAS data set Data one; Input a$ b c; cards; 8 76 5 7 43 9 1 22 2 ; Run; *Reading internal data to create SAS data set ‘one’ INPUT statement example BMTRY 789 Intro. To SAS Programming
*Note - SAS log • Whenever you read data from an external file, SAS gives some very valuable information about the file in the SAS log • Always check this information after you read a file as it could indicate problems • A simple comparison of the number of records read from the INFILE with the number of observations in the SAS data set can tell you a lot about whether or not SAS is reading your data correctly BMTRY 789 Intro. To SAS Programming
*Note – Long Records • In some operating environments, SAS assumes external files have a record length of 256 or less. (The record length is the number of characters, including spaces, on a data line.) • If you data lines are long, and it looks like SAS is not reading all your data, then use the LRECL= option in the INFILE statement to specify a record length at least as long as the longest record in your data file. INFILE ‘c:\MyData\Diabetes.dat’ LRECL=2000; BMTRY 789 Intro. To SAS Programming
Controlling INPUT with Options in the INFILE statement • The following options are useful for reading particular types of data files. Place these options after the filename in the INFILE statement. • FIRSTOBS= • This tells SAS at what line to begin reading data. This is useful if you have a data file that contains descriptive text or header information at the beginning and you want to skip over these lines to begin reading the data. • OBS= • This tells SAS to stop reading when it gets to that line in the raw data file. BMTRY 789 Intro. To SAS Programming
Controlling INPUT with Options in the INFILE statement (cont.) • MISSOVER • By default, SAS will go to the next data line to read more data if SAS has reached the end of the data line and there are still more variables in the INPUT statement that have not been assigned values. • The MISSOVER option tells SAS that if it runs out of data, don’t go to the next data line. Instead, assign missing values to any remaining variables before proceeding to the next line. BMTRY 789 Intro. To SAS Programming
Controlling INPUT with Options in the INFILE statement (cont.) • PAD • You need this option when you are reading data using column or formatted input and some data lines are shorter than others. If a variable’s field extends past the end of the data line, then, by default, SAS will go to the next line to start reading the variable’s value. • This option tells SAS to read data for the variable until it reaches the end of the data line, or the last column specified in the format or column range, whichever comes first. BMTRY 789 Intro. To SAS Programming
Data Step: input statement There are three basic forms of the input statement: • List input (free form) – data fields must be separated by at least one blank. List the names of the variables, follow the name with $ for character data Example: Input Name$ Age; • Column input – follow the variable name (and $ for character) with a startingcolumn – endingcolumn Example: Input Name$ 1-15; • Formatted input – Optionally precede the variable name with @startingcolumn; follow the variable name with a SAS format designation Example: Input @1 Name$ 20. @21 DOB mmddyy8.; BMTRY 789 Intro. To SAS Programming
LIST INPUT:Reading Raw Data Separated by Spaces • If the values in your raw data file are all separated by at least one space, then using list input to read the data may be appropriate • Any missing data must be indicated with a period • Character data, if present, must be simple: no embedded spaces, and no values greater than eight characters in length. (Use the LENGTH statement to change the length) LENGTH Name$ 20.; • If the data files contains dates or other values which need special treatment, then list input may not be appropriate INPUT Name$ Age Height; • The $ after Name indicates that it is a character variable, whereas the Age and Height variables are both numeric BMTRY 789 Intro. To SAS Programming
COLUMN INPUT:Reading Raw Data Separated by Columns • If each of the variable’s values is always found in the same place in the data line, then you can use column input as long as all values are character or standard numeric • Standard numeric data contain only number, decimal points, plus and minus signs, and E for scientific notation. Dates or numbers with embedded commas, for example, are not standard INPUT Name$ 1-10 Age 11-13 Height 14-18; • The first variable, Name, is character and the data values are in columns 1 through 10. The Age and Height variables are both numeric, since they are not followed by a $, and data values for both of these variables are in the column ranges listed after their names BMTRY 789 Intro. To SAS Programming
FORMATTED INPUT:Reading Raw Data NOT in Standard Format • This is where you want to use a Formatted Input or a Mixed Input. • Informats are useful anytime you have non-standard data • Numbers with embedded commas or dollar signs are examples of non-standard data • Dates are perhaps the most common non-standard data • Using date informats, SAS will convert conventional forms of dates into a number, the number of days since January 1, 1960. This number is referred to as a SAS date value (0) BMTRY 789 Intro. To SAS Programming
Difference between INFORMAT and FORMAT? • INFORMATs give SAS special instructions for reading a variable • FORMATs give SAS special instructions for writing a variable • If specified in a DATA step, the name of the informat or format will be saved in the data set and will be printed by PROC CONTENTS • Like the LABEL statement, these can also be used in the PROC step to customize your reports, but they would not be stored in the data set BMTRY 789 Intro. To SAS Programming
Informats: 3 basic types • Character, numeric, date • Character: $informatw. • Numeric: informatw.d • Date: informatw. • The $ indicates character informats, INFORMAT is the name of the informat, w is the total width, and d is the number of decimal places (numeric only) • Two informats do not have names: $w., which reads standard character data, and w.d, which reads standard numeric data BMTRY 789 Intro. To SAS Programming
Informats (cont.) • The period in an informat is very important because it distinguishes an informat from a variable name, which, by default, cannot contain any special characters except the underscore INPUT Name : $10. Age : 3. Height : 5.1 DOB : MMDDYY10. *Selected Informats can be found in pp. 44-45 (3rd Ed) in “The Little SAS Book”. BMTRY 789 Intro. To SAS Programming
Formatted Input Example INPUT Name : $16. Age : 3. +1 Type : $1. +1 Date MMDDYY10. (Score1 Score2 Score3 Score4 Score5) (4.1); • The variable Name has an informat of $16., meaning that it is a character variable 16 columns wide. Variable Age has an informat of three, is numeric, three columns wide, and has no decimal places. The +1 skips over one column. Variable Type is character, and it is one column wide. Variable Date has an informat MMDDYY10. And reads dates in the form 10-31-1999 or 10/31/1999, each 10 columns wide. The remaining variables, Score1 through Score5, all require the same informat, 4.1. By putting the variables and the informat in separate sets of parentheses, you have only to list the informat once. BMTRY 789 Intro. To SAS Programming
Mixing Input Styles • List style is the easiest; column style is a bit more work; and formatted style is the hardest of the three. However, column and formatted styles do not require spaces (or other delimiters) between variables and can read embedded blanks. • Sometimes you use one style, sometimes another, and sometimes the easiest way is to use a combination of styles. SAS is so flexible that you can mix and match any of the input styles for your own convenience. BMTRY 789 Intro. To SAS Programming
Mixing Input Styles (cont.) • With list style input, SAS automatically scans to the next non-blank field and starts reading. • With column style input, SAS starts reading in the exact column that you specify. • But with formatted input, SAS just starts reading-wherever the pointer is, that is where SAS reads. Sometimes you need to move the pointer explicitly, and you can do that by using the column pointer, @n, where n is the number of the column SAS should move to. BMTRY 789 Intro. To SAS Programming
Mixed Input example INPUT ParkName$ 1-22 State$ Year @40 Acreage COMMA9.; 1--------------------------------------------------------------23----------------------------------------------------40----------------------- Yellowstone ID/MT/WY 1872 * 4,065,493 Everglades FL 1934 * 1,398,800 Yosemite CA 1864 * 760,917 Great Smokey Mountains NC/TN 1926 * 520,269 Wolf Trap Farm VA 1966 * 130 INPUT ParkName$ 1-22 State$ Year Acreage COMMA9.; Acreage would look like (It would start reading at the *): 4065 . . 5 . BMTRY 789 Intro. To SAS Programming
Reading Multiple Lines of Raw Data per Observation • In a typical raw data file each line of data represents one observation, but sometimes the data for each observation are spread out over more than one line. • To tell SAS when to skip to a new line, you simply add line pointers to your INPUT statement. • To read more than one line of raw data for a single observation, you simply insert a slash (/) into your INPUT statement when you want to skip to the next line of raw data. BMTRY 789 Intro. To SAS Programming
Reading Multiple Lines of Raw Data per Observation (con.) • The (#n) works the same as (/) but it is more fexible. The #n works by inserting the number of the column for that observation where you want to read your raw data. Nome AK INPUT City$ State$ / NormHi NormLo #3 RecHi RecLo; 55 44 88 29 Miami FL … BMTRY 789 Intro. To SAS Programming
Reading Multiple Observations per Line of Raw Data (@@) • When you have multiple observations per line of raw data, you can use double trailing at signs (@@) at the end of your INPUT statement. • SAS will hold that line of data, continuing to read observations until it either runs out of data or reaches an INPUT statement that does not end with a double trailing @. This is also known as a “hard hold”. Nome AK 55 44 88 29 Miami FL 72 62 105 40 Atlanta . 59 . 12 INPUT City$ State$ NormHi NormLo RecHi RecLo @@; BMTRY 789 Intro. To SAS Programming
Reading Part of a Raw Data File (@) • You don’t have to read all the data before you tell SAS whether to keep an observation. Instead, you can read just enough variables to decide whether to keep the current observation. • Similar to the @@, SAS will hold that line of data with a single trailing @. This is known as a “soft hold”. • While the trailing @ holds that line, you can test the observation with an IF statement to see if it’s one you want to keep. If it is, you can then read the data for the remaining variables with a second INPUT statement. • With the trailing single @, SAS will automatically start reading the next line of raw data with each INPUT statement. BMTRY 789 Intro. To SAS Programming
Reading Part of a Raw Data File (@) Example Suppose you have a dataset containing heart and lung transplant information but you are trying to construct a dataset of only lung transplant patients. It is a very large data set that takes a lot of time to run so you don’t want to read it all in first and then select out the portion you want to keep. It would be better to read in only those data that you want initially. BMTRY 789 Intro. To SAS Programming
Reading Part of a Raw Data File (@) Example (cont.) Heart 7823 12nov1989 Heart 6477 08sep1992 Lung 7231 22jul1995 Heart 2347 30jan1990 Lung 7842 12mar1998 DATA Lung; INFILE ‘c:\MyData\Trnsplnt.dat’; INPUT Type$ @; If Type = ‘Heart’ then DELETE; INPUT RecNum TranDt : Date9.; Run; BMTRY 789 Intro. To SAS Programming
Reading external comma-delimited data • We have two choices when given this type of data • We can use an editor and replace all the commas with blanks, or • We can leave the commas in the data and use the DLM= option in the INFILE statement Data HtWt; Infile ‘c:\MyData\survey.txt’ DLM=‘,’; Input ID Gender$ Age Height Weight; Run; BMTRY 789 Intro. To SAS Programming
Reading external comma-delimited data (cont.) • Another method besides the DLM= option is to use DSD in the INFILE • This option performs several other functions besides treating commas as delimiters. • If it finds two adjacent commas, it will assign a missing value • It will allow text strings surrounded by quotes to be read into a character variable and will strip the quotes in the process Data HtWt; Infile ‘c:\MyData\survey.txt’ DSD; Input ID Gender$ Age Height Weight; Run; BMTRY 789 Intro. To SAS Programming
Permanent SAS Data Sets • A two level name…a Temporary SAS data set is the one level name that we have been using: LibraryName.DataSetName • Temporary SAS data sets will not exist when you shut down the instance of SAS in which they were created. Data new; Set AIDS; Run; • First define a SAS Library (Libref) BMTRY 789 Intro. To SAS Programming
Libname Statement • Use this statement to define your SAS Library location before using your SAS data sets Example: LIBNAME Annie ‘C:\SASDATA’; Proc Means Data = Annie.EX4A N MEAN STD; Var X Y Z; Run; BMTRY 789 Intro. To SAS Programming
Creating Permanent SAS Data Sets Libname annie “C:\SASDATA”; Data Annie.EX1; INPUT Group$ X Y Z; DATALINES; Control 12 17 19 Treat 23 . 21 Control 19 18 16 Treat 22 22 . ; Run; BMTRY 789 Intro. To SAS Programming
Using the Permanent SAS Data Sets Libname xyz “C:\SASDATA”; Title “Means from EX1”; Proc Means Data=xyz.EX1; Var X Y Z; Run; BMTRY 789 Intro. To SAS Programming
Now let’s try the in-class problems listed on slide 1 BMTRY 789 Intro. To SAS Programming