360 likes | 461 Views
Chapter 1: Getting Started. Overview of basic components Data Sets Data Steps Windowing (DM) environment Submitting programs Reviewing Output System options. The SAS Language. Actually, SAS contains several languages. SAS statements vs. SAS commands All statements end with “;” .
E N D
Chapter 1: Getting Started • Overview of basic components • Data Sets • Data Steps • Windowing (DM) environment • Submitting programs • Reviewing Output • System options
The SAS Language • Actually, SAS contains several languages. • SAS statements vs. SAS commands • All statements end with “;” . • Free format language. Can have • multiple statements per line • multiple lines for a single statement. • Neither is a good idea most of the time.
SAS Names • Used to be limited to 8 characters. With v7 the limit went to 32. • First character a letter or underscore (_). • Subsequent characters in name can be letters, digits or underscores. • Case is significant only for cosmetic/display purposes. SAS stores names in mixed case but will match totpop and TotPop.
Exception: librefs & filerefs • Names associated with SAS data libraries and ascii files are still limited to 8 characters. (Because of platform limits on MVS, CMS, others?) • Also applies to names of SAS formats created with PROC FORMAT.
SAS Comments • Two kinds: • “statement style” begin with “*” and end with “;” • Other kind begin with “/*’ and end w. “*/” • If you use statement-style for your real comments then you can use the other kind to “comment out” sections of code.
Ex. of “Commented Out” Code • /* ===========Begin commented out code========= • *---Step 1: Read the data--; • data one; infile ‘name_of_the_file’; • input a b c d e f g; • if a=1 then a=0; else if b=2 then b=3; *--edit vals; • run; • *--Step 2: Sort and Print the Data--; • proc sort data=one; by d e g; run; • proc print data=one; by d; title ‘Data Set One’; run; • ================end comment================= */ • *---Step 3: Begin statistical analysis of the data--; • proc univariate data=one ; • ----etc-----
SAS Data Sets • This is where SAS stores the data. • Statistical vs. database terminology: • Observations = Rows • Variables = Columns • Data Sets = Tables • The observations describe entities, the variables are attributes of those entities. • In our environment the rows are usually geographic areas and the variables are summary statistics regarding those areas.
Variable Attributes • Type (character or numeric) • Length (3-8 bytes for nums, 1-2000 for character strings. • Label: Up to 256 characters. • Format: Used by default when the variable is displayed. E.g. comma9. or $mocnty. • Informat: Format used to convert typed values entered interactively.
Date Variables • No such thing as an explicit date var type. • Dates are stored as numeric values as the # of days since Jan. 1, 1960. • Format codes are used to read and display data variables. I.e. read it with mmddyy6. And display it with date9.
Sample Pgm: Dates • data dates; • input dateval mmddyy6. sales; • format dateval date9.; • datalines; • 020198 1234 • 122501 5678 • 80199 725 • 091101 1,023 • run; • options ls=80; • procprintdata=dates; • title'Listing of dates'; • run;
Sample Pgm: Dates - Output • Obs dateval sales • 1 01FEB1998 1234 • 2 25DEC2001 5678 • 3 01AUG1999 725 • 4 11SEP2001 .
Program Steps • Data steps and Proc(edure) steps. • Some stmts (e.g. title, options, %let) are not part of any specific step. (“global statements”). • Step boundaries: • Begin with data or proc statements. • End with run stmt or next step or EOF. • Highly recommended: always use run;
How Many Steps? • data dates2; • input date mmddyy6. sales; • informat sales comma.; • format date date9.; • datalines; • 020198 1234 • 122501 5678 • 10299 725 • 091101 1,023 • run; • options ls=80; • procprintdata=dates2; • procsortdata=dates2; by date;
Data Step Cycles (“Built-In Loop”) • Most data steps have 1 and only 1 data source. Usually an infile/input or a set or merge statement represent the data source. • SAS executes the data step stmts once for each input line/observation. • The data step stmts are compiled and, if no errors, executed -- once for each set of data. • Variable _n_ (“automatic”) counts the cycles through the implicit loop.
SAS Windowing Environment • AKA DM - “Display Manager” • You can run SAS without using it -- edit code with a text editor and use batch mode. • It takes some getting used to, but it’s worth it. • The Windows version is different than all the rest. Platform independence vs. MS software standards clashed and MS won.
The Enhanced Editor • Only mentioned in TLSB. It is here and it makes the PROGRAM window obsolete under Windows. (But still needed for Unix and all other platforms.) • It is a Windows editor. The text editor used in the Program window was modeled after the SPF editor developed by IBM in the 70s.
Major Differences • Code does not disappear and have to be recalled when you submit it. • Code is color-coded as you type to serve as a serious debugging aide. • Does not support many of the commands that the pgm window does. New users won’t care. • You can have bunches of them open at the same time.
Other Windows • Log: see what happened with submitted code. Error messages, notes, warnings,etc. • Output: “Printed” output goes here. Results of most SAS procs. • Explorer and Results. • Notepad: another text editor; for data usually. • Keys: Define function keys. Different ones for different window types. • Filename, Libname, Dir and Var very handy.
Ways to Issue Commands • Not only are there lots of different windows with lots of different commands, but there are lots of ways to specify those commands. • Pull down menus. (The pmenu option can be used to turn on/off these menus.) • Toolbar icons associated with commands. • Entering command in the Command box. • Function keys! (Not mentioned in TLSB).
Accessing Windows • To bring a window to the foreground and make it the “active” window: • Click within it if it is visible • Enter the name of the window as a command • Use Window pull-down and select it. • Use a function key associated with the window name. (E.g. if F10 = “Log”, just hit F10 to go to the log window. • Enter Next command to go to “next” window. • Click on the window name tab in bar at bottom.
Submitting Code • Differs somewhat between pgm window and Enhanced Editor window. • If text is selected in the window then only selected text is submitted. Otherwise, the entire program is submitted. • In Program window you need to use Recall command to bring the submitted code back.
Viewing Results of Submit • The log window tells you what happened. Rather detailed. Error messages color coded. • If no errors and code executes, “printed” results go to Output window and/or to a html file (output destinations can be specified.) • Results window is a sort of index to the Output window.
Compile & Go Phases • Code must be compiled prior to executing. The execution phase will be skipped if there are errors at compile phase. • In batch runs, SAS will set “options obs=0” when it detects an error. In this mode, later steps will compile but not execute. • Once a step fails, it can cause lots of bogus error messages in subsequent steps.
SAS System Options • System opts control all sorts of things regarding how SAS runs. • Options can be specified in many ways at different times (at SAS startup, or during execution.) • Can be specified via: • config file with “-set ..” stmt • as a parameter at invocation • using options statement or Options window.
Common Options • Printing options: • linesize= ; pagesize= ; date/nodate; center/nocenter; number/nonumber • DMS, DMR (invocation options) • Obs= (limit # observations to process) • (no)source (show source code in log) • (no)mprint (show code generated by macros)
Sample SAS Code • Follow the URL:mcdc2.missouri.edu/cgi-bin/uexplore?/pub/data/indctrs@secure • Click on the “Tools” subdirectory and then on the mocopop.sas file. • The direct URL for this file is: mcdc2.missouri.edu/data/indctrs/Tools/mocopop.sas
Browsing .sas, .log and .lst Files • The Windows Registry may associate the SAS program with these 3 filetypes. • With IE, this can cause an instance of SAS to start up when all you want to do is browse the contents of a .sas file. • You can do a manual remove of the registry entry. • Netscape does not recognize the association.
mocopop.sas • You are NOT yet expected to understand (completely) most of what’s in the program. • It has lots of steps, and accesses a set of 5 data sources -- 4 SAS data sets from the archive and 1 dbf file. • A common key, fipco, is kept on each data set. Such keys are critical. • Step 5 uses a merge stmt to bring all the data together into a single permanent SAS data set. Note the by fipco; statement.
mocopop.sas - 2 • Note how all data definition statements -- libname and filename statements -- are grouped at the top of the program. Not required, but a good convention. • Note (extensive) use of only statement-style comments. In debugging this setup, we used /* - */ “commenting out” extensively.
mocopop.sas - 3 • Note the “classic” SAS data step for accessing the data archive: • data <set-name>; • set / merge <set(s)>;(often with data set options specified). • where statement to filter observations. • Assignment stmts to edit data or create derived variables. Sometimes as part if if … then . • Keep or drop stmts to specify variables to be included on output set.
mocopop.sas - 4 • Note ability to access dbf file via proc dbf. Could also have used proc import. • Note use of attrib statements in Step 5 to establish not only the attributes of the variables (labels, length/types and formats) but also the order of the variables on the output set. • Note that the obs identifier variables are of type character, but all indicator variables are numeric.
mocopop.sas - 5 • The creation of indctrs.mocopopg as a sas data step view is way too advanced for us now. • For now, just know that there is a way to combine data sets logically rather than physically. Indctrs.mocopopg looks like a data set to SAS, but is stored as code, not actual data.
mocopop.sas - 6 • The step to aggregate the data in mocopopg to DED regions is still further beyond what we have covered so far. • Involves use of an application macro named %agg. This macro is like an extension of the language for us. • Aggregation of our data is a critically important capability.
mocopop.sas - 7 • Use the uexplore utility application to browse the indictrs data directory. • Display hypercon reports for the mocopop and mocopopg data sets. • Extract data regarding the pop change over the decade of the 90’s with components of change. Create a listing report and a csv (opened with Excel) file.
mocopop.sas - Summary • A typical “real world” SAS program. • In a way, quite complex; but with SAS it becomes just a little long. • Most of the processing is fairly routine once you have mastered a small subset of the SAS language. • Organizing such applications into carefully structured and commented modules makes it easy for us to document how we got our data.
The Data Archive • The source of most data you’ll be working with. Specialists create these sets and verify the data. • The uexplore/xtract/hypercon tools are - for now - critical in making these data accessible to the outside world. Wide use helps insure reliability. • For you, access directly via SAS is much faster and flexible. • The key indicators data base is just one -- very important -- component of the archive.