260 likes | 544 Views
SAS Programming. SAS Data Mart. Outline. Access different format of data for SAS SAS data mart SAS data manipulation. Making Use of SAS Enterprise Guide Code. Import a text file Example: Orders.txt Import an Excel file Example: SupplyInfo.xls. Learn from Examples. SAS Help
E N D
SAS Programming SAS Data Mart
Outline • Access different format of data for SAS • SAS data mart • SAS data manipulation
Making Use of SAS Enterprise Guide Code • Import a text file • Example: Orders.txt • Import an Excel file • Example: SupplyInfo.xls
Learn from Examples • SAS Help • Contents -> Learning to use SAS -> Sample SAS Programs -> Base SAS • “Base Usage Guide Examples” • Chapter 3, 4
Import an Excel Sheet procimport out=work.commrex datafile ="C:\Lin\Shared\ISQS6339\Commrex_3358.xls" dbms=excel replace; sheet="Company"; getnames=yes; mixed=no; scantext=yes; usedate=yes; scantime=yes; run; procprint data=work.commrex; run;
Excel SAS/ACCESS LIBNAME Engine libname xlsdata 'C:\Lin\Shared\ISQS6339\Commrex_6339.xls'; procprint data=xlsdata.New1; run;
Exercise 10: SAS Data Step Programming • Import and print text file Userlog00.txt • Import and print Excel sheet customers in Afs_customers.xls • Send the Word file containing the Screenshots of two outputs to zhangxi.lin@hotmail.com
SAS Data Mart • Definitions with minor differences • A repository for data to be used as a source for business intelligence • Object-oriented data repository, held for decision support and BI needs. • A data mart is a specialized version of a data warehouse. • Like data warehouses, data marts contain a snapshot of operational data that helps business people to strategize based on analyses of past trends and experiences. • The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data. • A data mart configuration emphasizes easy access to relevant information.
SAS Data Mart Structures for Different Purposes • Longitudinal analysis • Data mart has observations over time • The generalized form of the data structure: time <-- value --> category • Example: Beer sales • Transactional data, and aggregated levels • Example: web log • The order of rows matters • Cross-sectional dimension data – ranked by categories • Interleaved time series – ranked by time
Auction • Description: This is data from 19 livestock auction markets. The columns include: the number of head of different livestock sold (in thousands) including CATTLE, CALVES, HOGS, and SHEEP, the cost of operation of the auction market (in thousands of dollars) (COST), and the market identifier (MARKETID). • The object is to use multiple linear regression to describe the relationship between the cost of operations to the number of livestock sold in the various classes. COST will be the dependent variable and CATTLE, CALVES, HOGS, and SHEEP the independent variables. • An additional variable, VOLUME, is the total of all major livestock sold in each market. It is the sum of the variables CATTLE, CALVES, HOGS, and SHEEP, and can be used to demonstrate an exact linear dependency between independent variables. rows, columns. • Regression, Graphic Analysis
Beer Sales • Description: Beer sales records monthly sales of beer in hectoliters, along with the average high and low temperatures in the region, over a period of five years. • The object is to see how beer sales change over time. You can also consider the relationships between beer sales and temperatures. • 4 rows, 60 columns. • Regression, Time Series
SAS Data Mart Topics • Data accessing • Data manipulation – Proc SQL • Data transposing • Data transformations • Interval-scaled observations processing • Derived variables • Time intervals • Define start and end of the interval • Define how intervals are measured • Binning Observations into groups • Transformations of distributions • Replacing missing values • Categorical observations processing • Coding for predictive modeling • Data preparation for different formats of data marts • Data sampling • Model scoring and automation
Data Transpose • Basic transposing tasks (Ch14) • From a multi-rows-per-subject to a one-row-per-subject • From a one-row-per-subject to a multi-rows-per-subject • Creating key-value tables • Transposing longitudinal data (Ch15) • From STANDARD to INTERLEAVED and back • From CROSS SECTIONAL to STANDARD and back • Complex scenarios • Macro
Data Transformations • Transformations of Interval-Scaled Variables (Ch16) • Simple derived variables • Derived relative variables • Time intervals • Binning observations into groups • Transformations of distributions • Replacing missing values • Transformations of Categorical Variables (Ch17) • Derived variables • Combining categories • Dummy coding of categorical variables • Multidimensional categorical variables
Exercise 11 • Data transpose. Write a SAS code to transpose SAS data set Auction in the shared directory. • Transpose without other parameters • Drop some variables • Remove the generated variables Input data. • Userlog0.txt, Userlog1.txt
SAS Programming Illustrative Examples • Data sets merge • SAS Procedures (Proc) • Proc Freq • Proc Means • Demonstration: QUERYLOG.TXT
PROC FREQ data SummerSchool; input Gender $ Internship $ Enrollment $ Count @@; datalines; boys yes yes 35 boys yes no 29 boys no yes 14 boys no no 27 girls yes yes 32 girls yes no 10 girls no yes 53 girls no no 23 ; proc freq data=SummerSchool order=data; weight count; tables Internship*Enrollment / chisq; run;
PROC FREQ PROCSQL; CREATE VIEW WORK.SORT AS SELECT AnonID, QueryDate FROM SASUSER.IMPW_0012; QUIT; PROCfreq DATA=WORK.SORT ORDER=INTERNAL; TABLES AnonID / NOPERCENT NOCUM SCORES=TABLE; TABLES QueryDate / NOPERCENT NOCUM SCORES=TABLE; RUN;
PROC MEANS PROCSQL; CREATE VIEW WORK.SORTTempTableSorted AS SELECT QueryTime, AnonID, QueryDate FROM SASUSER.IMPW_0012; QUIT; PROCMEANS DATA=WORK.SORTTempTableSorted (drop= _way_ _type_) FW=12 PRINTALLTYPES CHARTYPE NWAY VARDEF=DF MIN MAX N; VAR QueryTime; CLASS AnonID / ORDER=UNFORMATTED ASCENDING; CLASS QueryDate / ORDER=UNFORMATTED ASCENDING; OUTPUT OUT=QUERYLOG(LABEL="Summary Statistics for SASUSER.IMPW_0012") MIN()= MAX()= N()= / AUTONAME AUTOLABEL WAYS INHERIT; RUN;
Case Study – Building a Customer Data Mart • Business Context • To create an analytical model that predicts customers that have a high probability to leave the company. • To create a data mart that reflects the properties and behaviors of customers. • The data mart will also contain a variable that presents the fact that a customer has left the company at a certain point in time. • Five SAS datasets • Customer – demographic and customer baseline data • Accounts – information customer accounts • Leasing – data on leasing information • Callcenter – transactional data on call center contacts • Scores – transactional data of value segment scores • The datasets have the unique CUSTID variable.
Data Mart Programs • Create aggregation on a subject level of ACCOUNTS, LEASING, CALLCENTERS and SCORES tables • ACCOUNT • Derive sums per customer • Calculate an account type split based on the balance • LEASING • Summarize the value and the annual rate per customer • CALLCENTER • Aggregate the call center data over all customers using the FREQ proc • Calculate the frequencies for the category COMPLAINT • SCORES • Contains the VALUESEGMENT for each customer and month. If a customer cancelled his contract, he got an “8.lost”, or he got “1.Gold”, etc. • Create three output data sets: value segments in actual month, one month ago, and in a future month • Join the resulting tables together and create a CUSTOMER DATA MART table.
Exercise 12 • Use PROC FREQ to generate a summary statistics table from 6339EX12.xls dataset • Hints • Define path: • libname xlsdata ‘<path>\6339EX12.xls'; • Define the worksheet as ‘weblog$’n • Column: Page, Row: UserID • Explore the data mart created by the code of Ch25.sas • Use SAS Enterprise Guide to create joint tables from the dataset • Try One-way Frequency and Summary Statistics functions with the ACCOUNTS data set
More Topics • Multiple Interval-Scaled Observation per Subject • Multiple Categorical per Subject • Coding for Predictive Modeling • Data Preparation for Multi-Rows-per-Subject and Longitudinal Data Marts • Sampling • Scoring and Automation • Do’s and Don’ts
Base Procedures Guide Examples • Elementary Statistics Procedures, Chapter 1 • Reporting Procedures, Chapter 2 • APPEND Procedure, Chapter 5 • CALENDAR Procedure, Chapter 7 • CHART Procedure, Chapter 9 • COMPARE Procedure, Chapter 11 • CORR Procedure, Chapter 15 • FORMAT Procedure, Chapter 18 • FORMS Procedure, Chapter 19 • FREQ Procedure, Chapter 20 • MEANS Procedure, Chapter 21 • PLOT Procedure, Chapter 25 • PRINT Procedure, Chapter 27 • RANK Procedure, Chapter 29 • SQL Procedure, Chapter 34 • STANDARD Procedure, Chapter 35 • TABULATE Procedure, Chapter 37 • TIMEPLOT Procedure, Chapter 40 • TRANSPOSE Procedure, Chapter 41 • UNIVARIATE Procedure, Chapter 42
Base Usage Guide Examples • What is the SAS System?, Chapter 1 • Introduction to DATA Step Processing, Chapter 2 • Starting with Raw Data, Chapter 3 • Starting with SAS Data Sets, Chapter 4 • Understanding DATA Step Processing, Chapter 5 • Working with Numeric Variables, Chapter 6 • Working with Character Variables, Chapter 7 • Acting on Selected Observations, Chapter 8 • Creating Subsets of Observations, Chapter 9 • Grouped or Sorted Observations, Chapter 10 • More than One Obs. in a Calculation, Chapter 11 • Finding Shortcuts in Programming, Chapter 12 • Working with Dates in the SAS System, Chapter 13 • Concatenating SAS Data Sets, Chapter 15 • Interleaving SAS Data Sets, Chapter 16 • Merging SAS Data Sets, Chapter 17 • Updating SAS Data Sets, Chapter 18 • Manipulating SAS Data Sets, Chapter 19 • Understanding and Enhancing Your Output, Chapter 20 • Printing Detailed Reports, Chapter 24 • Creating Summary Tables, Chapter 25 • Mailing Labels and Other Forms, Chapter 26 • Plotting Relationships Between Variables, Chap. 27 • Charting the Values of a Variable, Chapter 28 • Writing Output, Chapter 29 • Customizing Output, Chapter 30
SQL Examples • Demonstrates Basic SQL SELECT Statements • Demonstrates Basic SQL CREATE Statements • Demonstrates Basic SQL DELETE Statements • Demonstrates Basic SQL Dictionary Statements • Demonstrates Basic SQL Insert Statements • Demonstrates SQL Join Performance with Macros • Demonstrates Basic SQL JOIN Features • Demonstrates SQL Set Operators • Introduction to the SQL Procedure, Chapter 1 • Retrieving Data with the SELECT Statement, Chapter 2 • Datasets Used in SQL Usage Guide • Demonstrates Basic SQL UPDATE Statements • Demonstrates SQL View Capabilities • Demonstrates SQL View Update Capabilities • Demonstrates Recursive Joins with PROC SQL • Demonstrates Nested Statistics with SQL • Demonstrates CASE Statement with PROC SQL • Demonstrates the Product of Columns with PROC SQL • Demonstrates Merging Tables with Proc SQL • Demonstrates Use of Macros with PROC SQL • Demonstrates Use of Cartesian Product with PROC SQL • More Uses of the Cartesian Product with PROC SQL • Demonstrates Use of Subqueries with PROC SQL • Demonstrates Relationship Between Data Step and SQL