270 likes | 291 Views
PubH 6420 Introduction to SAS Programming. Instructor: Greg Grandits TA: Michael Petzold Textbook: The Little SAS Book, 5 th Edition. Course Information. Prerequisite: Want to learn SAS Evaluation: 6 assignments and 2 exams Monitored computer lab hours (Mayo C381) Web site for class
E N D
PubH 6420Introduction to SAS Programming Instructor: Greg Grandits TA: Michael Petzold Textbook: The Little SAS Book, 5th Edition
Course Information • Prerequisite: Want to learn SAS • Evaluation: 6 assignments and 2 exams • Monitored computer lab hours (Mayo C381) • Web site for class http://www.biostat.umn.edu/~greg-g/ph6420.html Datasets, programs, lectures, help links
Course Information • Access to SAS using personal copy of PC SAS or any computer with SAS available to you. • Version 9.2 or 9.3 or 9.4
Course Resources • Dataset Documentation • Case Report Forms for TOMHS • Data dictionary for TOMHS dataset • Instructions on working with TOMHS dataset • Datasets • SAS programs for download • Help tutorials
SAS OS/Environment • Windows PC • UNIX /Linux
Lecture 1 Readings • LSB (Chapter 1)
What is SAS ? • SAS is a programming language that reads, processes, and performs statistical analyses of data. • A SAS program is made up of programming statements which SAS interprets to do the above functions. Note: Programming statements are sometimes referred to as “syntax” or programming “code”. A program is sometimes called a “syntax” file.
SAS Usage • Started in late 1970s • Used extensively at academic and business environments (medical device and pharmaceutical companies) • Many analyses of publications in medical journals use SAS
Parts of SAS Program • DATA step • Reads in and processes your raw data and makes a SAS dataset. • Procedures (PROCS) • Performs specific statistical analyses • Some procedures are utility procedures such as PROC SORT that is used to sort your data
Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Data Step Analyze Data Using Statistical Procedures PROCs
Structure of Data • Made up of rows and columns • Rows in SAS are called observations • Columns in SAS are called variables • Together they make up the dataset (table) An observation (row) is all the information for one entity (patient, patient visit, clinical center, county) SAS data step processes data one observation at a time
Example of Data 12 observations and 5 variables F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN
Types of Variables In SAS • Numeric (e.g. age, blood pressure) • 54, 140 • Character (patient ID, diagnosis) • A001, TIA, 0410 You need to tell SAS if the data is character. The default is numeric.
Rules for SAS Statements • SAS statements end with a semicolon (;) data demo; infile datalines; input gender $ age; • SAS statements can be entered in lower or uppercase data demo; infile datalines; input gender $ age; DATA DEMO; INFILE DATALINES; INPUT GENDER $ AGE; IS SAME AS :
Rules for SAS Statements • Multiple SAS statements can appear on one line data demo; infile datalines; input gender $ age; X1 = 0; X2 = 0; X3 = 0; X4 = 0; • A SAS statement can use multiple lines input gender $ age marstat;
Rules for SAS Variables Names • Variable names can be from 1-32 characters and must begin with A-Z or an underscore (_). No special characters except underscore is allowed. • OK AS VARIABLE NAMES • dbp12 • DiastolicBloodPressure • _dbp12 • Not OK AS VARIABLE NAMES • 12dbp • dbp 12 • dbp*12
* This is a short example program to demonstrate what a SAS program looks like. This is a comment statement because it begins with a * and ends with a semi-colon ; data demo; infiledatalines; input gender $ age marstat $ credits state $ ; if credits > 12then fulltime = 1 ; else fulltime = 2; if state = 'MN'then resid = 1; else resid = 2; datalines; F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; RUN; procprintdata=demo ; var gender age marstat credits fulltime state ; run; * More procedures; DATA STEP SAS PROCEDURE
1 data demo; Create a SAS dataset called demo 2 infiledatalines; Where is the data? 3 input gender $ What are the variable age names and types? marstat $ credits state $ ; 4 if credits > 12then fulltime = 1; else fulltime = 2; 5 if state = 'MN'then resid = 1; else resid = 2; Statements 4 and 5 create 2 new variables
6 datalines; Tells SAS the data is coming F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; Tells SAS the data has ended 7 run; Tells SAS to run the statements above
Main SAS Windows (PC) • Editor Window – where you type your program • Log Window –lists program statements processed, giving notes, warnings and errors. Always look at the log window ! Tells how SAS understood your program • Output Window/Results Viewer – gives the output generated from the PROCs • Results Window – index to all of your output Submit program by clicking on run icon
Messages in SAS Log • Errors: fatal in that program will abort • Warnings: messages that are usually important • Notes: messages that may or may not be important (notes and warnings will not abort your program)
* This is a short example program to demonstrate what a SAS program looks like. This is a comment statement because it begins with a * and ends with a semi-colon ; data demo; infiledatalines; input gender $ age marstat $ credits state $ ; if credits > 12then fulltime = 1; else fulltime = 2; if state = 'MN'then resid = 1 ; else resid = 2; datalines; F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; run; title'Running the Example Program'; procprintdata=demo ; var gender age marstat credits fulltime state ; run;
OUTPUT (Results) WINDOW Running the Example Program Obs gender age marstat credits fulltime state 1 F 23 S 15 Y MN 2 F 21 S 15 Y WI 3 F 22 S 9 N MN 4 F 35 M 2 N MN 5 F 22 M 13 Y MN 6 F 25 S 13 Y WI 7 M 20 S 13 Y MN 8 M 26 M 15 Y WI 9 M 27 S 5 N MN 10 M 23 S 14 Y IA 11 M 21 S 14 Y MN 12 M 29 M 15 Y MN The MEANS Procedure Variable N Sum Mean ---------------------------------------------- age 12 294.0000000 24.5000000 credits 12 143.0000000 11.9166667 ----------------------------------------------- The FREQ Procedure Cumulative Cumulative gender Frequency Percent Frequency Percent ----------------------------------------------------------- F 6 50.00 6 50.00 M 6 50.00 12 100.0 proc means data=demo; var age credits; proc freq data=demo; tables gender;
Some common procedures PROC PRINT • lists out your data - always a good idea!! PROC MEANS • descriptive statistics for continuous data PROC FREQ • descriptive statistics for categorical data PROC UNIVARIATE • detailed descriptive statistics for continuous data PROC TTEST • performs t-tests (continuous data)