750 likes | 1.1k Views
Data In. Information Out. The Information Delivery Process. Manage. Organize. Exploit. Turning Data Into Information. DATA Step. Data. Data. PROC Steps. Data. SAS Data Sets. PROC Steps. Information. Information. Turning Data Into Information.
E N D
Data In Information Out The Information Delivery Process Manage Organize Exploit
Turning Data Into Information DATA Step Data Data PROC Steps Data SAS Data Sets PROC Steps Information Information
Turning Data Into Information Process of delivering meaningful information: 80% Data-related: • Access • Scrub • Transform • Manage • Store and retrieve 20% Analysis
The Raw Data Partial fixed-column raw data file: 1 1 2 2 3 3 4 4 1---5----0----5----0----5----0----5----0----5
Browsing the Data Values Listing of Flight Crew Employees Obs empid lastname firstname jobcode salary 1 0031 GOLDENBERG DESIREE PILOT1 50221.62 2 0040 WILLIAMS ARLENE M. FLTAT1 23666.12 3 0071 PERRY ROBERT A. FLTAT1 21957.71 4 0082 MCGWIER-WATTS CHRISTINA PILOT3 96387.39 5 0091 SCOTT HARVEY F. FLTAT2 32278.40 6 0106 THACKER DAVID S. FLTAT1 24161.14 7 0275 GRAHAM DEBORAH S. FLTAT2 32024.93 8 0286 DREWRY SUSAN PILOT1 55377.00 9 0309 HORTON THOMAS L. FLTAT1 23705.12 10 0334 DOWN EDWARD PILOT1 56584.87 11 0347 CHERVENY BRENDA B. FLTAT2 38563.45 12 0355 BELL THOMAS B. PILOT1 59803.16 13 0366 GLENN MARTHA S. PILOT3 120202.38 14 0385 HOLMAN GREGORY A. PILOT2 93001.09 15 0390 NOE BARBARA E. FLTAT2 37101.32
Reading a Raw Data File Raw Data File SAS Data Set
0031GOLDENBERG DESIREE0040WILLIAMS ARLENE M.0071PERRY ROBERT A. 0082MCGWIER-WATTSCHRISTINA Reading Raw Data Files Raw Data File SAS Data Set DATA Step empid lastname firstname 0031 GOLDENBERG DESIREE0040 WILLIAMS ARLENE M. 0071 PERRY ROBERT A. 0082 MCGWIER-WATTS CHRISTINA data . . .; infile . . .; input . . .;run;
Reading Raw Data Files In order to create a SAS data set from a raw data file, you must • start a DATA step and name the SAS data set being created (DATA statement) • identify the location of the raw data file to read (INFILE statement) • describe how to read the data fields from the raw data file (INPUT statement).
Creating a SAS Data Set with the DATA Statement General form of the DATA statement: This DATA statement creates a SAS data set called WORK.EMPDATA: data work.empdata; DATA SAS-data-set(s);
Pointing to a Raw Data File with the INFILE Statement General form of the INFILE statement: Examples: OS/390 infile ‘edc.prog1.employee’; UNIX infile ‘/user/prog1/employee.dat’; WIN infile ‘C:\workshop\winsas\ prog1\employee.dat’; INFILE ‘filename’ <options>;
Reading Raw Data Using Column Input General form of column input: To read raw data values with column input, 1. name the SAS variable you want to create 2. use a dollar sign, $, if the SAS variable is character 3. specify the starting column, a dash, and the ending column of the raw data field. INPUT variable $ startcol-endcol …;
Reading Raw Data Using Column Input 1 1 2 2 3 3 4 4 0031GOLDENBERG DESIREE PILOT1 50221.62 input empid $ 1-4 lastname $ 5-17 1---5----0----5----0----5----0----5----0----5
Reading Raw Data Using Column Input 1 1 2 2 3 3 4 4 0031GOLDENBERG DESIREE PILOT1 50221.62 input empid $ 1-4 lastname $ 5-17 firstname $ 18-30 1---5----0----5----0----5----0----5----0----5
Reading Raw Data Using Column Input 1 1 2 2 3 3 4 4 0031GOLDENBERG DESIREE PILOT1 50221.62 input empid $ 1-4 lastname $ 5-17 firstname $ 18-30 jobcode $ 31-36 1---5----0----5----0----5----0----5----0----5
Reading Raw Data Using Column Input 1 1 2 2 3 3 4 4 0031GOLDENBERG DESIREE PILOT1 50221.62 input empid $ 1-4 lastname $ 5-17 firstname $ 18-30 jobcode $ 31-36 salary 37-45; 1---5----0----5----0----5----0----5----0----5
Reading Raw Data Using Column Input 1 1 2 2 3 3 4 4 0031GOLDENBERG DESIREE PILOT1 50221.62 input empid $ 1-4 lastname $ 5-17 firstname $ 18-30 jobcode $ 31-36 salary 37-45; 1---5----0----5----0----5----0----5----0----5
Excel Spreadsheet SAS Data Set Business Scenario International Airlines is preparing to review its flight crew. The immediate goal is to read the Excel spreadsheet and create a SAS data set.
What is the Import Wizard? A point-and-click graphical interface that enables you to create a SAS data set from several types of external files including • dBASE file (*.DBF) • Excel 97 Spreadsheet (*.XLS) • Microsoft Access Table • Delimited file (*.*) • Comma Separated Values (*.CSV)
1 1 2 2 3 3 4 4 1---5----0----5----0----5----0----5----0----5 JetCruise LF5200 030003 04/05/1994 03/11/2001 JetCruise LF5200 030005 02/15/1999 07/05/2001 JetCruise LF5200 030008 03/06/1996 04/02/2002 JetCruise LF5200 030009 10/14/1998 09/15/2001 JetCruise LF5200 030011 09/04/1998 08/31/2001 JetCruise LF5200 030012 01/02/1994 03/29/2001 JetCruise LF5200 030013 02/01/1996 11/23/2002 JetCruise LF5200 030015 06/24/1998 02/06/2001 The Raw Data Partial data: aircraft ID last maintenance date The aircraft data is stored in a fixed-column raw data file: aircraft model date in service
The raw data file will be read with formatted input. Using Formatted Input Raw Data File JetCruise LF5200 030003 04/05/1994 03/11/2001 JetCruise LF5200 030005 02/15/1999 07/05/2001 JetCruise LF5200 030008 03/06/1996 04/02/2002 DATA Step data sas-data-set-name; infile raw-filename; input pointer-controlvariableinformat-name;run; SAS Data Set
What is a SAS Format? A format is an instruction that the SAS System uses to write data values. SAS formats have the following form: <$>format<w>.<d>
SAS Formats Selected SAS formats: w.d standard numeric format $w. standard character format COMMAw.d commas in a number: 12,234.21 DOLLARw.d dollar signs and commas in a number: $12,234.41
Using Formatted Input General form of the INPUT statement with formatted input: Pointer control: @n moves the pointer to column n. +n moves the pointer n positions. INPUTpointer-control column informat ...;
Using Formatted Input Formatted input can be used to read non-standard data values by • moving the input pointer to the starting position of the field • specifying a column name • specifying an informat. An informat specifies the width of the input field and how to read the data values that are stored in the field.
Using Formatted Input General form of an informat: $ indicates a character format. informat-name names the informat. w is an optional field width. . is the required delimiter. d optionally, specifies a decimal for numeric informats. $informat-namew.d
Selected Informats 7. or 7.0 reads seven columns of numeric data. 7.2 reads seven columns of numeric data and inserts a decimal point in the data value. $5. reads five columns of character data and removes leading blanks. $CHAR5. reads five columns of character data and preserves leading blanks.
Selected Informats COMMA7. reads seven columns of numeric data and removes selected nonnumeric characters, such as dollar signs and commas. PD4. reads four columns of packed decimal data. MMDDYY10. reads dates of the form 01/20/2000.
Working with Date Values The raw data file contains date values. These date values will be read with the MMDDYY10. informat: Jetcruise LF5200 030003 04/05/1990 3/11/2001 Jetcruise LF5200 030005 02/15/1990 7/05/2001 Jetcruise LF5200 030008 03/06/1990 4/02/2002 1 1 2 2 3 3 4 4 1---5----0----5----0----5----0----5----0----5
Converting Dates to SAS Date Values SAS uses date informats to read and convert dates to SAS date values. For example, Stored Value Informat Converted Value 10/29/1999 MMDDYY10. 14546 29OCT1999 DATE9. 14546 29/10/1999 DDMMYY10. 14546
SAS Formats Selected SAS date formats: MMDDYYw. 101692 (MMDDYY6.) 10/16/92 (MMDDYY8.) 10/16/1992 (MMDDYY10.) DATEw. 16OCT92 (DATE7.) 16OCT1992 (DATE9.)
1 1 2 2 3 3 4 4 1---5----0----5----0----5----0----5----0----5 JetCruise LF5200 030003 04/05/1994 03/11/2001 JetCruise LF5200 030005 02/15/1999 07/05/2001 JetCruise LF5200 030008 03/06/1996 04/02/2002 JetCruise LF5200 030009 10/14/1998 09/15/2001 JetCruise LF5200 030011 09/04/1998 08/31/2001 JetCruise LF5200 030012 01/02/1994 03/29/2001 JetCruise LF5200 030013 02/01/1996 11/23/2002 JetCruise LF5200 030015 06/24/1998 02/06/2001 Locating and Browsing the Raw Data File Partial raw data file: aircraft ID last maintenance date Browse the raw data file and determine the column layout and type: aircraft model date in service
Starting the DATA Step Use the DATA statement to begin the DATA step and name the SAS data set: data work.aircraft; other SAS statements run; Use the INFILE statement to identify the input raw data file: data work.aircraft; infile ‘aircraft.dat’;other SAS statementsrun;
1 1 2 2 3 3 4 4 1---5----0----5----0----5----0----5----0----5 Writing the INPUT Statement Use the INPUT statement and pointer control to read the record starting with the first column. Read the value with the $16. informat and assign it to the variable MODEL. JetCruise LF5200 030003 04/05/1994 03/11/2001 data work.aircraft; infile ‘aircraft.dat’; input @1 model $16. other SAS statementsrun;
1 1 2 2 3 3 4 4 1---5----0----5----0----5----0----5----0----5 Writing the INPUT Statement Use the INPUT statement and pointer control to read the record starting with column 18. Read the value with the $6. informat and assign the value to AIRCRAFTID. JetCruise LF5200 030003 04/05/1994 03/11/2001 data work.aircraft;infile ‘aircraft.dat’; input @1 model $16. @18 aircraftid $6.other SAS statementsrun;
1 1 2 2 3 3 4 4 1---5----0----5----0----5----0----5----0----5 Writing the INPUT Statement Use the INPUT statement and pointer control to read the record starting with column 25. Read the value with the MMDDYY10. informat and assign the value to INSERVICE. JetCruise LF5200 030003 04/05/1994 03/11/2001 data work.aircraft infile ‘aircraft.dat’; input @1 model $16. @18 aircraftid $6. @25 inservice mmddyy10. other SAS statementsrun;
1 1 2 2 3 3 4 4 1---5----0----5----0----5----0----5----0----5 Writing the INPUT Statement Use the INPUT statement and pointer control to read the record starting with column 36. Read the value with the MMDDYY10. informat and assign the value to LASTMAINT. JetCruise LF5200 030003 04/05/1994 03/11/2001 data work.aircraft; infile ‘aircraft.dat’; input @1 model $16. @18 aircraftid $6. @25 inservice mmddyy10. @36 lastmaint mmddyy10.;run;
SAS Syntax Rules SAS statements are free-format. • They can begin and end in any column. • One or more blanks or special characters can be used to separate words. • A single statement can span multiple lines. • Several statements can be on the same line. Unconventional spacing data work.mech_pilot; infile 'c:\coursedata\emplist.dat'; input lastname $ 1-20 firstname $ 21-30 jobtitle $ 36-43 salary 54-59; run; proc means data=work.mech_pilot n mean; class jobtitle; var salary;run;
SAS Syntax Rules SAS statements are free-format. • They can begin and end in any column. • One or more blanks or special characters can be used to separate words. • A single statement can span multiple lines. • Several statements can be on the same line. Unconventional spacing data work.mech_pilot; infile 'c:\coursedata\emplist.dat'; input lastname $ 1-20 firstname $ 21-30 jobtitle $ 36-43 salary 54-59; run; proc means data=work.mech_pilot n mean; class jobtitle; var salary;run;
SAS Syntax Rules • SAS statements • usually begin with an identifying keyword • always end with a semicolon. data work.mech_pilot; infile 'c:\coursedata\emplist.dat'; input lastname $ 1-20 firstname $ 21-30 jobtitle $ 36-43 salary 54-59; run; proc print data=work.mech_pilot; run; proc means data=work.mech_pilot n mean; class jobtitle; var salary; run;
Adding a New Variable Create a new variable by extracting the four-digit year values from the SAS date values. Aircraft Service Records
Using an Assignment Statement An assignment statement evaluates an expression and assigns the resulting value to a variable. General syntax of an assignment statement: variable=expression;
Selected operators for basic arithmetic calculations in an assignment statement: Using Operators
Using SAS Functions A SAS function is a routine that returns a value that is determined from specified arguments. General syntax of a SAS function: function-name(argument1,argument2, . . .)
Using SAS Functions SAS functions • perform arithmetic operations • compute statistics (for example, mean) • manipulate SAS dates and process character values • perform many other tasks.
Creating a Vertical Bar Chart Use the GCHART procedure and the VBAR statement to create a vertical bar chart. proc gchart data=work.aircraft; vbar yrbeg_service; title 'Aircraft In Service, by Year';run;
Reading a Subset of Raw Data Use the DATA step that was written earlier. Add a subsetting IF statement to process only the subset in which the value of AGE is at least 15. data work.aircraft; infile ‘aircraft.dat’; input @1 model $16. @18 aircraftid $6. @25 inservice mmddyy10. @36 lastmaint mmddyy10.;yrbeg_service=year(inservice); age=year(today())-yrbeg_service; if age>=15;run;
What Is a SAS Data Library? Regardless of which host operating system you use, you identify SAS data libraries by assigning each one a libref. libref
What Is a SAS Data Library? By default, SAS creates two SAS data libraries: • a temporary library called WORK • a permanent library called SASUSER. WORK SASUSER