430 likes | 743 Views
Into to SAS ®. Objectives. List the components of a SAS program. Open an existing SAS program and run it. SAS Data Set. SAS Programs. A SAS program is a sequence of steps that the user submits for execution. Raw Data. DATA steps are typically used to create SAS data sets. Report.
E N D
Objectives • List the components of a SAS program. • Open an existing SAS program and run it.
SASDataSet SAS Programs A SAS program is a sequence of steps that the user submits for execution. RawData DATA steps are typically used to create SAS data sets. Report DATAStep PROCStep SASDataSet PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, and sort data).
Step Boundaries datawork.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59; run; procprint data=work.staff; proc means data=work.staff; class JobTitle; var Salary; run;
SAS Windowing Environment Interactive windows enable you to interface with SAS.
Exercises • Open the SAS program “example.sas.” • Submit the program and examine the results. • Data for today's class located at • http://www.missouri.edu/~baconr/sas/econ
Objectives • Learn the two fundamental SAS syntax programming rules. • Learn to create a SAS dataset from a text file. • Write a Data Step to read a course data file.
Fundamental SAS Syntax Rules • SAS statements have these characteristics: • usually begin with an identifying keyword • always end with a semicolon datastaff; inputLastName $ FirstName $ JobTitle $ Salary; datalines; …insert text here… run; proc print data=staff; run;
1 1 2 1---5----0----5----0 43912/11/00LAX 20137 92112/11/00DFW 20131 11412/12/00LAX 15170 98212/12/00dfw 5 85 43912/13/00LAX 14196 98212/13/00DFW 15116 43112/14/00LaX 17166 98212/14/00DFW 7 88 11412/15/00LAX 187 98212/15/00DFW 14 31 Reading Raw Data Files Data for flights from New York to Dallas (DFW) and Los Angeles (LAX) is stored in a raw data file. Create a SAS data set from the raw data.
Raw Data File 1 1 2 1---5----0----5----0 43912/11/00LAX 20137 92112/11/00DFW 20131 11412/12/00LAX 15170 SAS Data Set Creating a SAS Data Set In order to create a SAS data set from a raw data file, you must do the following: Start a DATA step and name the SAS data set being created (DATA statement). DATA Step infile 'raw-data-filename'; run; input input-specifications; dataSAS-data-set-name; Identify the location of the raw data file to read (INFILE statement). Describe how to read the data fields from the raw data file (INPUT statement).
Creating a SAS Data Set • General form of the DATA statement: • Example: This DATA statement creates a temporarySAS data set named dfwlax: DATA libref.SAS-data-set(s); datawork.dfwlax; Example: This DATA statement creates a permanent SAS data set named dfwlax: libname ia 'SAS-data-library'; dataia.dfwlax;
Pointing to a Raw Data File • General form of the INFILE statement: • Examples: The PAD option in the INFILE statement is useful for reading variable-length records typically found in Windows and UNIX environments. INFILE 'filename' <options>; z/OS (OS/390) infile 'userid.prog1.dfwlax'; UNIX infile '/users/userid/dfwlax.dat'; Windows infile 'c:\workshop\winsas\prog1\dfwlax.dat';
Reading Data Fields • General form of the INPUT statement: • input-specifications • names the SAS variables • identifies the variables as character or numeric • specifies the locations of the fields in the raw data • can be specified as column, formatted, list, or named input. INPUT input-specifications;
Reading Data Using Column Input • Column input is appropriate for reading the following: • data in fixed columns • standard character and numeric data General form of a column INPUT statement: Examples of standard numeric data: • The term standard data refers to character and numeric data that SAS recognizes automatically. INPUT variable <$> startcol-endcol . . . ; 15 -15 15.4 +1.23 1.23E3 -1.23E-3
Reading Data Using Column Input 1 1 2 1---5----0----5----0 43912/11/00LAX 20137 92112/11/00DFW 20131 11412/12/00LAX 15170 input Flight $ 1-3 Date $ 4-11 Dest $ 12-14 FirstClass 15-17 Economy 18-20; ...
Creating Temporary SAS Data Sets Store the dfwlax data set in the work library. data work.dfwlax; infile 'raw-data-file'; input Flight $ 1-3 Date $ 4-11 Dest $ 12-14 FirstClass 15-17 Economy 18-20; run; NOTE: The data set WORK.DFWLAX has 10 observations and 5 variables. c06s1d1
Assignment Statements: Creating variables data work.dfwlax; infile 'raw-data-file'; input Flight $ 1-3 Date $ 4-11 Dest $ 12-14 FirstClass 15-17 Economy 18-20; Total=Firstclass+Economy; LogEconomy=LOG(Economy); run; NOTE: The data set WORK.DFWLAX has 10 observations and 5 variables. c06s1d1
Reading Data Using Formatted Input • Formatted input is appropriate for reading the following: • data in fixed columns • standard and nonstandard character and numeric data • calendar values to be converted to SAS date values
01JAN1959 01JAN1961 01JAN1960 informat -365 366 0 format 01/01/1959 01/01/1960 01/01/1961 Working with Date Values • Date values that are stored as SAS dates are special numeric values. • A SAS date value is interpreted as the number of days between January 1, 1960, and a specific date.
Reading Data Using Formatted Input • General form of the INPUT statement with formatted input: INPUTpointer-controlvariableinformat . . . ; • Formatted input is used to read data values by doing the following: • moving the input pointer to the starting positionof the field • specifying a variable name • specifying an informat
Reading Data Using Formatted Input • Pointer controls: • @n moves the pointer to column n. • +n moves the pointer n positions. • An informatspecifies the following: • the width of the input field • how to read the data values that are stored in the field
What Is a SAS Informat? • An informat is an instruction that SAS uses to read data values. • SAS informats have the following form: <$>informat-namew.<d> Number ofdecimal places Indicates acharacterinformat Informatname Requireddelimiter Total widthof the fieldto read
8.0 8.0 Selected Informats • w. standard numeric informat • Raw Data Value Informat SAS Data Value
MMDDYY8. Selected Informats COMMAw. reads numeric data and removes selected nonnumeric characters such as dollar signs and commas. Raw Data Value Informat SAS Data Value MMDDYYw. reads dates of the form mm/dd/yyyy. Raw Data Value Informat SAS Data Value COMMA7.0
$8. Selected Informats • $w. standard character informat(removes leading blanks) • Raw Data Value Informat SAS Data Value
Converting Dates to SAS Date Values • SAS uses date informats to read and convert dates to SAS date values. Examples: Raw Data Value Converted Value Informat 10/29/2001 MMDDYY10.15277 10/29/01 MMDDYY8.15277 29OCT2001 DATE9.15277 29/10/2001 DDMMYY10.15277 Number of days between01JAN1960and29OCT2001
Reading Data: Formatted Input 1 1 2 1---5----0----5----0 43912/11/00LAX 20137 92112/11/00DFW 20131 11412/12/00LAX 15170 input @1Flight $3. @4 Date mmddyy8. @12 Dest $3. @15 FirstClass 3. @18 Economy 3.; ...
Reading Data: Formatted Input 1 1 2 1---5----0----5----0 43912/11/00LAX 20137 92112/11/00DFW 20131 11412/12/00LAX 15170 Raw Data File data work.dfwlax; infile 'raw-data-file'; input @1 Flight $3. @4 Date mmddyy8. @12 Dest $3. @15 FirstClass 3. @18 Economy 3.; run; c06s2d1
Reading Data: Formatted Input proc print data=work.dfwlax; run; SAS date values The SAS System First Obs Flight Date Dest Class Economy 1 439 14955 LAX 20 137 2 921 14955 DFW 20 131 3 114 14956 LAX 15 170 4 982 14956 dfw 5 85 5 439 14957 LAX 14 196 6 982 14957 DFW 15 116 7 431 14958 LaX 17 166 8 982 14958 DFW 7 88 9 114 14959 LAX . 187 10 982 14959 DFW 14 31 c06s2d1
List Input with the Default Delimiter 51 4feb1989 132 530 50002 11nov1989 152 540 50003 22oct1991 90 530 50004 4feb1993 172 550 50005 24jun1993 170 510 50006 20dec1994 180 520 • The data is not in fixed columns. • The fields are separated by spaces. • There is one nonstandard field.
blanks commas tab characters Delimiters Common delimiters are A space (blank) is the default delimiter.
List Input • General form of the INPUT statement for list input: • You must specify the variables in the order that they appear in the raw data file. • Specify a $ after the variable name if it is character. No symbol after the variable name indicates a numeric variable. INPUTvar-1 $ var-2 . . . var-n;
50001 4feb1989 132 530 50002 11nov1989 152 540 50003 22oct1991 90 530 50004 4feb1993 172 550 50005 24jun1993 170 510 50006 20dec1994 180 520 Input Data • The second field is a date. How does SAS store date values?
Informats • To read nonstandard data, you must apply an informat. • General form of an informat: • Informats are instructions that specify how SAS reads raw data. <$>INFORMAT-NAME<w>.<d>
Specifying an Informat • To specify an informat when using list input, use the colon (:) format modifier in the INPUT statement between the variable name and the informat. • General form of a format modifier in an INPUT statement: INPUTvariable : informat;
Reading a Delimited Raw Data File data airplanes; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
50001 , 4feb1989,132, 530 50002, 11nov1989,152, 540 50003, 22oct1991,90, 530 50004, 4feb1993,172, 550 50005, 24jun1993, 170, 510 50006, 20dec1994, 180, 520 Non-Default Delimiter • The fields are separated by commas.
Using the DLM= Option • The DLM= option sets a character or characters that SAS recognizes as a delimiter in the raw data file. • General form of the INFILE statement with the DLM= option: • Any character you can type on your keyboard can be a delimiter. You can also use hexadecimal characters. INFILE 'raw-data-file' DLM='delimiter(s)';
Multiple Records Per Observation Farr, Sue Anaheim, CA 869-7008 Anderson, Kay B. Chicago, IL 483-3321 Tennenbaum, Mary Ann Jefferson, MO 589-9030 • A raw data file has three records per employee. Record 1 contains the first and last names, record 2 contains the city and state of residence, and record 3 contains the employee’s phone number.
Desired Output • The SAS data set should have one observation per employee. LName FName City State Phone Farr Sue Anaheim CA 869-7008 Anderson Kay B. Chicago IL 483-3321 Tennenbaum Mary Ann Jefferson MO 589-9030
Reading Multiple Records per Observation data address; infile'raw-data-file' dlm=','; input #1 LName$ FName $ #2 City $ State $ #3 Phone $; run; ...