780 likes | 800 Views
Chapter 1: Introduction. Chapter 1: Introduction. Objectives. List the tasks in the SAS Programming 3 course. Explain the naming convention that is used for the course files. Compare the three levels of exercises that are used in the course.
E N D
Objectives • List the tasks in the SAS Programming 3 course. • Explain the naming convention that is used for the course files. • Compare the three levels of exercises that are used in the course. • Describe, at a high level, how data is used and stored at Orion Star Sports & Outdoors. • Navigate to the Help facility.
Tasks in the SAS Programming 3 Course • The course topics include techniques for the following data management tasks: • compressing SAS data sets • creating indexes for a quick retrieval of subsets • performing table lookups using arrays, hash objects, or formats • combining data by merging, using the SQL procedure, or using multiple SET statements • combining summary and detail data • sorting and grouping data • developing a program quickly
Resource Utilization • As programmers, you want to perform these tasks as efficiently as possible and optimize the use of the following resources: • programmer time • I/O • CPU • memory • data storage space • network bandwidth
Business Scenarios • The business scenarios are opportunities to compare multiple techniques for performing the tasks. • For example: • Task: Table Lookups • Possible Techniques: • DATA step MERGE statement • PROC SQL joins • Formats in PUT functions or in FORMAT statements • DATA step arrays • DATA step hash objects
1.01 Multiple Answer Poll • What type(s) of SAS programs do you write? • Data manipulation with the DATA step • Data analysis with procedures • Report writing • A combination of the above • SAS training only; no programs written • Other
Filename Conventions p304d01x course ID chapter # type item # placeholder p304a01 p304a02 p304a02s p304d01 p304d02 p304e01 p304e02 p304s01 p304s02 Example: The SAS Programming 3 course ID is p3, so p304d01 = SAS Programming 3, Chapter 4, Demo 1.
Three Levels of Exercises You are not expected to complete all of the exercises in the time allotted. Choose the exercise or exercises that are at the level with which you are most comfortable.
Orion Star Sports & Outdoors Orion Star Sports & Outdoors is a fictitious global sports and outdoors retailer with traditional stores, an online store, and a large catalog business.The corporate headquarters is located in the United States with offices and stores in many countries throughout the world.Orion Star has about 1,000 employees and 90,000 customers, processes approximately 150,000 orders annually, and purchases products from 64 suppliers.
Orion Star Data As is the case with most organizations, Orion Star has a large amount of data about its customers, suppliers, products, and employees. Much of this information is stored in transactional systems in various formats. Using applications and processes such as SAS Data Integration Studio, this transactional information was extracted, transformed, and loaded into a data warehouse. Data marts were created to meet the needs of specific departments such as Marketing.
1.02 Quiz • Start your SAS session. • Open the Help facility. • Determine the path to use to obtain information about the SAS component objects.
1.02 Quiz – Correct Answer Determine the path to use to obtain information about the SAS component objects. Information relevant to this course can be found by following these paths in the SAS Help facility: Contents tab SAS Products Base SAS SAS 9.2 LanguageReference Dictionary • Dictionary ofComponentObject LanguageElements
SAS OnlineDoc Information relevant to this course can be found by following these paths in SAS OnlineDoc: Contents tab • Products DocumentationA-Z Base SAS SAS 9.2 LanguageReference Dictionary • Dictionary ofComponentObject LanguageElements You can also obtain information from SAS OnlineDoc.
Objectives • Identify the resources used by a SAS program. • Report computer resource usage using SAS system options. • Interpret resource usage statistics in your operating environment. • Benchmark resource usage.
Running a SAS Program • What resources are required to run a SAS program? • The programmer must perform the following tasks: • determine program specifications • write the program • test the program • execute theprogram • maintain theprogram
Running a SAS Program • The computer must perform the following actions: • load the required SAS software into memory • compile the program • read the data • execute the compiled program • store output data files • store output reports
What Resources Are Used? CPU programmertime I/O resources used networkbandwidth memory data storagespace
1.03 Multiple Answer Poll • Which of the following resources do you need to conserve? • CPU • I/O • Memory • Data storage space • Network bandwidth • Your time
Understanding Efficiency Trade-offs When you decrease the use of one resource, the use of other resources might increase. Resource usage is dependent on your data. A specific technique might be more efficient with one data set and less efficient with another.
Data Data Space 12 12 9 3 9 3 6 6 CPU Understanding Efficiency Trade-offs Often Implies ... Decreasing the size of a SAS data set can result in an increase in CPU usage.
I/O Memory Understanding Efficiency Trade-offs Often Implies Decreasing the number of I/O operations comes at the expense of increased memory usage.
Deciding What Is Important for Efficiency Your Programs Your Site Your Data
Understanding Efficiency at Your Site Hardware Operating Environment System Load SAS Environment
1.04 Multiple Choice Poll • This class uses SAS 9.2. • What is the latest version of SAS that are you running? • SAS 8.2 • SAS 9.1 • SAS 9.2 • Other
Knowing How Your Program Will Be Used • The importance of efficiency increases with the following: • the complexity of the program and/or the size of the files being processed • the number of times that the program will be executed
1.05 Multiple Answer Poll • What type(s) of data do you use? • SAS data sets • External files • Data from a relational database – for example, Oracle, Teradata, or SQL Server • Excel spreadsheets • OLAP cubes • Information maps • Other
Considering Trade-Offs • In this class, many tasks are performed using one or more techniques. • To decide which technique is most efficient for a given task, benchmark, or measure and compare, the resource usage of each technique. • You should benchmark with the actual data to determine which technique is the most efficient. The effectiveness of any efficiency technique depends greatly on the data with which you use the technique.
Running Benchmarks: Guidelines continued... • To benchmark your programming techniques, do the following: • Turn on the appropriate options to report resource usage. • Test each technique in a separate SAS session. • Test only one technique or change at a time, with as little additional code as possible. • Run your tests under the conditions that your final program will use (for example, batch execution, large data sets, and so on).
Running Benchmarks: Guidelines • Run each program several times and base your conclusions on averages, not on a single execution. (This is more critical when you benchmark elapsed time.) • Exclude outliers from the analysis because that data might lead you to tune your program to run less efficiently than it should. • Turn off the options that report resource usage after testing is finished, because they consume resources. In a multi-user environment, other computer activities might affect the running of your program.
1.06 Multiple Choice Poll • Which of the following SAS programs should be benchmarked? • A report that shows all the customers in the United Kingdom in March 2006 • A report that calculates trends in sales at the end of every day for every department • A report showing the projected total cost of a 5% cost-of-living increase in employee salaries for a Human Resources project conducted on January 1, 2007 • A yearly report that calculates the average sales of a line of apparel for the clothing manager
1.06 Multiple Choice Poll – Correct Answer • Which of the following SAS programs should be benchmarked? • A report that shows all the customers in the United Kingdom in March 2006 • A report that calculates trends in sales at the end of every day for every department • A report showing the projected total cost of a 5% cost-of-living increase in employee salaries for a Human Resources project conducted on January 1, 2007 • A yearly report that calculates the average sales of a line of apparel for the clothing manager
Tracking Resource Usage STIMER SASOptions STATS (z/OS only) MEMRPT (z/OS only) FULLSTIMER
Tracking Resources with SAS Options OPTIONS STIMER | NOSTIMER; OPTIONS NOFULLSTIMER | FULLSTIMER; STIMER | NOSTIMER OPTIONS STATS | NOSTATS; OPTIONS MEMRPT | NOMEMRPT; OPTIONS NOFULLSTIMER | FULLSTIMER; • Windows, UNIX • z/OS • Invocation option only
Business Scenario • You should benchmark to determine the most efficient technique for creating a new variable based on a condition. • The following methods can be used: • IF-THEN with an assignment statement • IF-THEN/ELSE with an assignment statement • SELECT/WHEN with an assignment statement
1.07 Quiz Open and submit p301a01a. Record the user CPU: ____________ Exit SAS. Start SAS. Open and submit p301a01b. Record the user CPU: ____________ Exit SAS. Start SAS. Open and submit p301a01c. Record the user CPU: ____________ Which technique is most efficient? In z/OS, record the CPU.
Sample Windows Log • 5 options fullstimer; • 6 data _null_; • 7 length var $ 30; • retain var2-var50 0 var51-var100 'ABC'; • do x=1 to 100000000; • 10 var1=10000000*ranuni(x); • 11 if var1>1000000 then var='Greater than 1,000,000'; • 12 if 500000<=var1<=1000000 then var='Between 500,000 and 1,000,000'; • 13 if 100000<=var1<500000 then var='Between 100,000 and 500,000'; • 14 if 10000<=var1<100000 then var='Between 10,000 and 100,000'; • 15 if 1000<=var1<10000 then var='Between 1,000 and 10,000'; • 16 if var1<1000 then var='Less than 1,000'; • 17 end; • 18 run; • NOTE: DATA statement used (Total process time): • real time 1.26 seconds • user cpu time 0.98 seconds • system cpu time 0.04 seconds • Memory 278k • OS Memory 4976k • Timestamp 6/29/2010 12:39:21 PM p301a01a Partial SAS Log
Sample UNIX Log 1 options fullstimer; 2 data _null_; 3 length var $30; 4 retain var2-var50 0 var51-var100 'ABC'; 5 do x=1 to 10000000; 6 var1=10000000*ranuni(x); 7 if var1>10000000 then var='Greater than 1,000,000'; 8 if 500000<=var1<=1000000 then var='Between 500,000 and 1,000,000'; 9 if 100000<=var1<500000 then var='Between 100,000 and 500,000'; 10 if 10000<=var1<100000 then var='Between 10,000 and 100,000'; 11 if 1000<=var1<10000 then var='Between 1,000 and 10,000'; 12 if var1<1000 then var='Less than 1,000'; 13 end; 14 run; NOTE: DATA statement used (Total process time): real time 6.62 seconds user cpu time 5.14 seconds system cpu time 0.01 seconds Memory 526k OS Memory 5680k Timestamp 6/29/2010 11:55:32 AM Page Faults 82 Page Reclaims 0 Page Swaps 0 Voluntary Context Switches 91 Involuntary Context Switches 48 Block Input Operations 91 Block Output Operations 0 p301a01a Partial SAS Log
Sample z/OS Log p301a01a Partial SAS Log