290 likes | 783 Views
SAS 101: The Beginner’s Guide to Beginning in SAS. Frank DiIorio CodeCrafters, Inc. Chapel Hill NC. Off-Topic: Staten Island and Hilton Head: Separated at Birth???. Today We'll Discuss. Company background Overview of the product, focusing on components of Base SAS Sample programs
E N D
SAS 101: The Beginner’s Guide to Beginning in SAS Frank DiIorio CodeCrafters, Inc. Chapel Hill NC
Off-Topic: Staten Island and Hilton Head:Separated at Birth???
Today We'll Discuss • Company background • Overview of the product, focusing on components of Base SAS • Sample programs • Resources for learning more about the product line It's a quick overview. Don't get wrapped up in syntax and details.
Some Background (1 of 2) • Started at NC State University in late 60's • Initial focus was data manipulation and statistics on IBM mainframe (System 360/370) • Incorporated in July 1976 • Breadth of the product and its user base grows – graphics, econometrics packages, followed by "vertical" products • Continued expansion in functionality, platforms, data interchange
Some Background (2 of 2) • Currently, nearly $2 billion in revenues, high customer (43,000+) and employee (10,000) retention rates, 400+ offices worldwide, and nationally known as a good employer. • Largest privately-held software company in the world. Co-founder and President Jim Goodnight is wealthiest man in North Carolina. • Regarded by users as a company that is highly responsive to its customer base. • Over the years, the idea of being a professional SAS programmer (not, for example, "a utility rate anlayst who uses SAS") has "gained traction."
The Big Picture, Take 1:SAS Viewed As an Elephant Recall the blind Indians touching an elephant, each touching a different part and thinking it was a tree, a spear, a fan, etc. If SAS were an elephant, here's what it might look like: Analytics Data Integration Base SAS Enterprise Intelligence Enterprise Guide ETL Integration Technologies Solutions
The Big Picture, Take 2:Platforms, Products, Solutions Powerful, user-friendly, and cutting-edge. But … predefined functionality. Consider what's at the heart of it all (next slide)
The Big Picture, Take 3: Focus on Base SAS • Base SAS is at the heart of it all • Here it is, in a vastly simplified nutshell, illustrating key concepts of data manipulation, DATA step-PROC interplay, and pervasive influence of the macro language and global statements that influence the SAS program's execution environment. global statements (options, headers, footers, etc.) macro doc . pdf . rtf . txt . xml . others DATA step data . sas . oracle . access . excel . others PROCs ODS data
What’s Most Important? • It varies by your needs, of course, but it’s always important to understand how the core elements work • Knowledge of Base SAS is essential. Using it, you can: • Read and write data in virtually any format (e.g., SAS, Microsoft Access/Excel, Oracle) • Manipulate data using SAS's syntax and/or industry-standard Structured Query Language (SQL) • Write summarized or detailed graphic or text-based reports in many different formats (JPG, GIF, HTML, plain text, PDF, RTF, etc.) • Better understand and effectively use the other components of the SAS System
Base SAS Basics • What does a SAS progam look like? • A series of statements, delimited by a semicolon • Each statement performs a specific activity: calculation, rearranging or printing a dataset, etc. • SAS knows if the statement is syntactically (not logically) correct by determining its location in the program (its context) and applying syntax rules. • Principal program units/building blocks: • DATA step ("third generation") • PROCEDUREs, or PROCs ("4th generation") • Output rendering (aka Output Delivery System, ODS) • Text generation (Macro language) • How to run a program? • Batch • Interactive • Either way, the syntax is (nearly) identical
A Base SAS Procedure Sampler Data handling is at the heart of it all. Now let's look at datasets.
Dataset Basics • Much of the program's activity is focused on manipulation of datasets. • Datasets are collections of measurements on subjects • The items being measured are variables • A collection of items for an event is an observation • Other products sometimes use the terms columns and rows • Datasets can be read and written in SAS's "native" format or as spreadsheets, tables in a database, etc.
It Isn't Just a Single Language SAS software addresses a wide range of user needs. This diversity is reflected in the variety of syntax. Different sub-languages, all working together despite disparate appearance and "look and feel" Among these are: • Base SAS • SQL (Structured Query Language) • IML (Interactive Matrix Language) • Macro We'll focus on Base SAS and macro.
Statement Syntax Most statements have identifiers, keywords, values, ending with a semicolon: libname master 'p:\archive\data'; data test; set master.fy2005; change = (rev2005-rev2004) / rev2004; run; proc print data=test; title "FY2005, with CHANGE variable added"; run; Consider: • Case usually doesn’t matter (unless it's enclosed in quotes, as in the TITLE statement, above) • Spaces usually don’t matter • Order of statements does matter • If you don't completely specify a statement, SAS assigns (usually) smart preset, default values.
Sample Program Identify the data source [1], then read a dataset [2], create a new variable [3], and write only observations meeting a particular criteria to a new dataset [4]. Rearrange the order of observations [5], then list the dataset [6]. 1 libname anldata 'f:\dev\c1\data\analysis'; data aeSer; set anldata.ae; aeDur = (aeEnd – aeStart) + 1; if aeSev >= 2 then output; run; proc sort data=aeSer; by aeSev; run; proc print data=aeSer; by aeSev; title "Mild or Severe Adverse Events"; run; 2 3 4 5 6
Sample Program (sidebar) SAS doesn't impose many rules. The following is functionally equivalent to the previous slide, but is harder to read due to lack of indentation and line breaks. libname anldata 'f:\dev\c1\data\analysis'; data aeSer; set anldata.ae; aeDur = (aeEnd – aeStart) + 1; if aeSev >= 2 then output; run; proc sort data=aeSer;by aeSev;run; proc print data=aeSer; by aeSev; title "Mild or Severe Adverse Events"; run; Take the time to make even simple programs readable!
Modify the Sample Program • Use options to control display of date and page number • Use Output Delivery System (ODS) to create PDF libname anldata 'f:\dev\c1\data\analysis'; options nodate number; data aeSer; set anldata.ae; aeDur = (aeEnd – aeStart) + 1; if aeSev >= 2 then output; run; proc sort data=aeSer; by aeSev; run; ods listing close; ods pdf file='AEser.pdf'; options noDate noNumber; proc print data=aeSer; by aeSev; title "Mild or Severe Adverse Events"; run; ods pdf close; ods listing; 1 2 2
What Have We Learned So Far? • SAS syntax is compact and flexible • It pays to make it readable • DATA step lets you read, write and manipulate data from many sources. PROCs do this as well, and perform common reporting and analytical tasks with a minimum of syntax. SAS software really shines when you need to exploit repetition. Let's look at the macro language.
Macro Language (1 of 5) Suppose our datasets had a variable PTID that held the subject number and that we wanted to print only for a specific subject. One approach is shown below. libname anldata 'f:\dev\c1\data\analysis'; ods listing close; ods pdf file="ptList.pdf"; proc print data=anlData.DM; where ptid = '001022'; title "Demography for 001022"; run; proc print data=anlData.AE; where ptid = '001022'; title "Adverse Events for 001022"; run; proc print data=anlData.CM; where ptid = '001022'; title "Con Meds for 001022"; run; ods pdf close; ods listing; Specify "filter" value in both WHERE and TITLE statements Duplicate the filter value in other calls to PRINT procedure
proc print data=anlData.DM; where ptid = '001022'; title "Demography for 001022"; run; proc print data=anlData.AE; where ptid = '001022'; title "Adverse Events for 001022"; run; proc print data=anlData.CM; where ptid = '001022'; title "Con Meds for 001022"; run; PRINTs for first PTID value proc print data=anlData.DM; where ptid = '001145'; title "Demography for 001145"; run; proc print data=anlData.AE; where ptid = '001145'; title "Adverse Events for 001145"; run; proc print data=anlData.CM; where ptid = '001145'; title "Con Meds for 001145"; run; PRINTs for second PTID value Macro Language (2 of 5) Consider the tedium involved when you'd want to print data for two subjects. You'd have to duplicate the statements for the next subject, as shown below:
Macro Language (3 of 5) • We create a macro – %listPT – with the ability to accept an ID value (the ID= portion of the %macro statement) • Then we run, or "call" the macro twice, once for each patient. libname anldata 'f:\dev\c1\data\analysis'; ods listing close; ods pdf file="ptList.pdf"; %macro listPT(id=); proc print data=anlData.DM; where ptid = "&id."; title "Demography for &id."; run; proc print data=anlData.AE; where ptid = "&id."; title "Adverse Events for &id."; run; proc print data=anlData.CM; where ptid = "&id."; title "Con Meds for &id."; run; %mend; %listPT(id=001022) %listPT(id=001145) ods pdf close; ods listing; ID is a macro parameter 1 Basically the same program as before, but no patient ID values, just references to them. 2
Macro Language (4 of 5) The macro language processor generates, then runs, statements. The previous slide showed the program. What SAS actually processed was: libname anldata 'f:\dev\c1\data\analysis'; ods listing close; ods pdf file="ptList.pdf"; proc print data=anlData.DM; where ptid = "001022"; title "Demography for 001022 "; run; proc print data=anlData.AE; where ptid = "001022"; title "Adverse Events for 001022"; run; proc print data=anlData.CM; where ptid = "001022"; title "Con Meds for 001022"; run; proc print data=anlData.DM; where ptid = "001145"; title "Demography for 001145"; run; proc print data=anlData.AE; where ptid = "001145"; title "Adverse Events for 001145"; run; proc print data=anlData.CM; where ptid = "001145"; title "Con Meds for 001145"; run; ods pdf close; ods listing; PRINTs for first PTID value PRINTs for second PTID value
Macro Language (5 of 5) This simple example used the macro language for simple text substitution. More sophisticated applications allow you to: • Write part of, all of, or several statements based on a logical condition. • Write "bulletproofed" applications that identify and react to incorrect or suspect user parameters, data values, etc. • Control output formatting, based on user-specified parameters • Create datasets • Create "global" macro variables that can be used by other macros and applications.
Resources • You're using one right now: user groups Review program presentations' keywords, interest level. Also: visit Demo Room, talk to poster presenters (today 1:30-2:30) • See support.sas.com\userGroups for details about other regional, local, and national groups • Web sites (support.sas.com, www.lexjansen.com) • SAS Online Help • Publications (by SAS Institute; Books By Users; others) Visit the Publications booth in the Demo room • Colleagues • SAS-L list server via: Google Groups (http://groups.google.com/group/comp.soft-sys.sas/topics?hl=en) UGA (http://listserv.uga.edu/archives/sas-l.html) • at work (people, sample programs)
Recap • Many products underneath the SAS System "umbrella" • Base SAS is, arguably, the most versatile, and has a strong presence in the job market • Building blocks: DATA step, PROCs, ODS, macro • Vast power, vast potential for confusion, so take advantage of resources here, on Web, elsewhere.
Contact Frank@CodeCraftersInc.com www.CodeCraftersInc.com Thanks for attending!