220 likes | 229 Views
Learn about SAS, a versatile tool for data processing and analysis. Explore data access, management, analysis, and presentation tasks with SAS. Understand SAS components and how to create, access, and manage SAS data sets. Discover SAS data processing steps, procedures, and data access methods using SAS.
E N D
Introduction to SAS ISYS 650
What Is SAS? • SAS is a collection of modules that are used to process and analyze data. • It began in the late ’60s and early ’70s as a statistical package (Statistical Analysis System). • SAS is also an extremely powerful, general-purpose programming language. • In recent years, it has been enhanced to provide state-of-the-art data mining tools and programs for Web development and analysis.
Data-Driven Tasks • The functionality of the SAS System is built around the four data-driven tasks common to virtually any application: • 1. data access: • addresses the data required by the application • 2. data management: • shapes data into a form required by the application • 3. data analysis: • summarizes, reduces, or otherwise transforms raw data into meaningful and useful information • 4. data presentation: • communicates information in ways that clearly demonstrate its significance
An Overview of SAS Data Processing DATA steps are used to create SAS data sets. PROC steps are used to process SAS data sets.
Explore the SAS workspace • When you first start SAS, the five main SAS windows open: • the Explorer • Results • Program Editor or Editor • Log • Output windows. • Menu: • Tools: New Library
Demo • Creating a new library: • Tools/New Library: • Name • Folder • Enable at start up • import a table to the new library from MS Access database and create a SAS data set: • File/Import data • Open a SAS dataset: a SAS data set (also called a table) is a file containing descriptor information and related data values. The file is organized as a table of observations (rows) and variables (columns) that SAS can process.
Components of SAS programs • DATA steps typically create or modify SAS data sets. For example, you can use DATA steps to • put your data into a SAS data set • compute the values for new variables • check for and correct errors in your data • produce new SAS data sets by subsetting, merging, and updating existing data sets. • PROC (procedure) steps typically analyze and process data in the form of a SAS data set, and they sometimes create SAS data sets that contain the results of the procedure.
A program accessing the SAS data set named “student” in Mydata library DATA myStudent; set Mydata.student; run; PROC print data=myStudent; run; Note: The DATA statement creates a temporary data set that references the “student” data set in the Mydata library. Temporary data sets are stored in the Work library.
SAS Data Access1. Import Wizard: File/Import DataDemo: Access, Excel
Process SAS Data Set 1. Reference the library name: PROC print data=mydata.emp; run; 2. Reference the Windows name directly: PROC print data="c:\mydata\emp"; run; 3. Creating a temporary SAS data set from existing SAS data set: DATA myStudent; *USE Mydata.student; *USE "C:\Mydata\student"; *SET Mydata.student; SET "C:\Mydata\student"; run; PROC print data=myStudent; run; Note: To add line comment, use “*”. To add block comment, use /* …. */
Creating a Permanent SAS Data Set by Using Windows’ File Name or Library.FileName in Data Statement DATA "c:\MyData\myStudent"; *USE Mydata.student; *USE "C:\Mydata\student"; *SET Mydata.student; SET "C:\Mydata\student"; run; DATA MyData.myStudent; SET "C:\Mydata\student"; run; Note: This example creates a new permanent data set from the “student” data set in MyData library.
Creating a Data Set Using Input Statement Temporary data set: DATA StGPA; INPUT SID $ Sname $ GPA; DATALINES; S1 Peter 3.2 S2 Paul 2.8 S3 Mary 3.0 run; Permanent data set: DATA MyData.StGPA2; INPUT SID $ Sname $ GPA; DATALINES; S1 Peter 3.2 S2 Paul 2.8 S3 Mary 3.0 run;
SAS Data Access2. Using ODBC with PROC SQL PROC SQL; CONNECT TO ODBC(DSN='MySalesDB2007'); CREATE TABLE temp_sas AS SELECT * FROM CONNECTION TO ODBC(SELECT * FROM Customer); Data Customer; set Work.temp_sas; run; PROC Print DATA=Customer; run; Note: The CREATE TABLE statement creates a SAS data set from the Customer table.
Create a SAS Data Set as the Result of a SQL Join Statement PROC SQL; CONNECT TO ODBC(DSN='MySalesDB2007'); CREATE TABLE temp_sas AS SELECT * FROM CONNECTION TO ODBC(SELECT Customer.CID, Cname, OID, Odate FROM Customer, Orders where Customer.cid=orders.cid); Data CustomerOrder; set Work.temp_sas; run; PROC Print DATA=CustomerOrder; run;
SAS Data Management • Creating calculated field • Use DROP and KEEP to select fields • Create a subset of a data set • Append two data sets • Merge data set • Equivalent to SQL outer join
Creating Calculated Field • Arithmetic operators: • +, -, *, /, ** • Using SAS functions: • ABS, INT, SQRT, ROUND • Date functions: • TODAY(): return current date • INTCK(‘interval’,from, to) • The ‘interva’ can be: DAY, WEEK, MONTH, QTR, YEAR • Example: Age = intck('year',dob,today()); • Year, Month, Qtr
Examples DATA GPAGroup; set work.Mystudent; IF GPA <2.0 then scholarship=1000; Else scholarship=3000; IF GPA <2.0 then GRPGrp='Poor'; Else GPAGrp='Good'; run; proc print data=GPAGroup; run; DATA AgeGroup; set Mydata.student2; Age = year(today())-year(DOB); run; proc print data=AgeGroup; run;
DROP/KEEP DATA Student; set work.Mystudent; DROP Gender DOB; run; proc print data=Student; run;
Subset a Data Set with IF DATA highIncome; set Mydata.Emp; IF Salary>60000; run; proc print data=highIncome; run;
Vertically Merging Two Data Sets(Append) DATA StDOB; set Mydata.Student; Name=Sname; KEEP Name DOB; run; DATA EmpDOB; set Mydata.Emp; DOB=Birthdate; KEEP Name DOB; run; DATA AllDOB; SET STDOB EmpDOB; Run; proc print data=ALLDOB; run;
Horizontally Merging Two Data Sets(1. Must be sorted by the same field; 2. this operation is equivalent to SQL Outer Join) PROC SQL; CONNECT TO ODBC(DSN='MySalesDB2007'); CREATE TABLE temp_sas AS SELECT * FROM CONNECTION TO ODBC(SELECT * FROM Customer); CREATE TABLE temp_sas2 AS SELECT * FROM CONNECTION TO ODBC(SELECT * FROM Orders); PROC SORT Data=Work.temp_sas; BY CID; PROC SORT Data=Work.temp_sas2; BY CID; Data CustomerOrders; MERGE temp_sas temp_sas2; BY CID; KEEP CID CNAME OID ODATE SALESPERSON; run; PROC Print DATA=CustomerOrders; run;
A few SAS PROCs • PROC PRINT • PROC SORT • PROC MEANS • PROC SQL