370 likes | 496 Views
Marketing 8001: Database Marketing Class 3. Discovering and Utilizing Reliable Patterns in Customer Data. Today’s Agenda. Exam 1 Next Week Reminder Class 3 Materials Accessing online SAS Data Step Issues Displaying Results: Proc Format Differentiating Customers Review Homework1
E N D
Marketing 8001:Database MarketingClass 3 Discovering and Utilizing Reliable Patterns in Customer Data Class 3
Today’s Agenda • Exam 1 Next Week • Reminder • Class 3 Materials • Accessing online SAS • Data Step Issues • Displaying Results: Proc Format • Differentiating Customers • Review Homework1 • Class Exercise • Notes on Missing Data…Review on Your Own Class 3
Exam 1 Next Week • Readings and class work to date, including material and homework for next week • SAS • Data Step programming • Procedures, including Proc Format • Make sure you can find topics in… • LSB • Online documentation of SAS • No multiple choice • You will need to be able to write SAS code and execute SAS Procs to answer questions about customers. • Exam should take 2 hours…but I’ve reserved the room until 7:30 p.m., in case some students need more time. Class 3
Reminder • Keep up with readings • LSB • D&D • Missou Media Pack • Material in readings and Power Point slides can be on exams, even if not covered in class • Bring questions about readings and SAS to class or contact me • SAS is the tool we’ll use to do much of what is presented in D&D text • Do not wait until Tuesday to do homework, unless you’ve mastered SAS Class 3
Class 3 Materials • Download Class 3 files to desktop…C:\M8001 • Class 3 PowerPoint => Class3_SP2008.ppt • Word document for SAS data analysis class exercise • ProcTabulateToProcExportExerciseClass3SP2008.doc • Work on the Proc Tabulate part of this problem (and outputting the Proc Tabulate table) BEFORE you look at the SAS code I’ve provided for you. • SAS program and data for SAS Class Exercise • ProcTabulateAndExportClass3SP2008.sas • HLPrimSampForTabExportClass3SP08.sas7bdat • GJC Notes on Proc Tabulate • Notes on Using Proc Tabulate.doc Class 3
Accessing Online SAS • http://v8doc.sas.com/sashtml/ • For Procs • Click on + Base SAS Software • Click on + Base SAS Procedures Guide • Click on + Procedures • Click on Proc desired • For Data Step documentation • Click on + SAS Language Reference: Concepts Class 3
SAS Data Step Issues • SAS is standard tool for DBM • Helps us deal with numbers, data • Numbers are not abstract…they represent measurements of attributes/characteristics of real customers • DBM is about marketing to people…and using data to do it well and profitably. Class 3
SAS Data Step Issues • Creating & manipulating variables…see LSB • Variable = expression; • New_Variable = Old_Variable + 17; • Math operations • Add + Var3 = Var1 + Var2; • Subtract - Var3 = Var1 – Var2 • Multiply * Var3 = Var1 * Var2 • Divide / Var3 = Var1 / Var2 • Exponentiate ** Var3 = Var1**2 • For division…remember zero divide exception • Do not to forget to end statement with semi-colon…; Class 3
SAS Data Step Issues • Standard math rules for precedence for one line of SAS code…see LSB • Exponentiation • Multiplication & Division • Addition & Subtraction • If many calculations in Data Step, remember SAS executes statements in order. • GJC Recommendation • Always use parentheses…don’t rely on standard math rules above • Remember to use as many left “(“ as right “)” • e.g. Logit_Score=Log(Prob_Score/(1-Prob_Score)) • Always check variables…especially complex ones • Proc Freq or Tabulate for categorical variables • Proc Print for continuous Class 3
SAS Data Step Issues • SAS Functions • See LSB 3.3 for functions (pp 80-81) • If necessary, you may need to look online • Be familiar with functions…lookup when needed Class 3
IF-THEN / ELSE Statements • Mnemonics…see LSB EQ (equals) (=) NE (not equals) (^=) GT (greater than) (>) LT (less than) (<) GE (greater than or equal) (>=) LE (less than or equal) (<=) • GJC Recommendation • Except for “=“ sign, use the character mnemonics • You’ll make fewer errors Class 3
IF-THEN / ELSE Statements • IF condition THEN action; ELSE action; • IF Gender=‘F’ THEN Female=1; ELSE Female=0; • Note: Because Gender is character, need to place values in single quotes. • Validate Coding Proc Freq; Tables Gender * Female / Missing; Run; • Remember…the more complicated the code, the more detailed your checking must be. • Don’t want a preventable SAS bug to cost you money…or a lower grade on an exam Class 3
IF-THEN / ELSE Statements • Always remember to initialize variables to zero or a default value • ELSE does it… IF Gender=‘F’ THEN Female=1; ELSE Female=0; • Alternative Coding, explicitly initialize value… Female=0; IF Gender=‘F’ THEN Female=1; • GJC Recommendation • Explicitly initialize variable, fewer errors Class 3
Displaying Results: Proc Format • Very useful for categorizing and displaying continuous variables. • Can also be used to combine levels of categorical variables together. • Can define verbal labels for variables because: • Analysis takes place in your head, not in a computer! • Proc Format does not read data • Following code will generate an error because Proc Format does not read an input data file • Proc Format Data=Temp; Class 3
Displaying Results: Proc Format • Business managers most comfortable with data in tables. • Formats created with Proc Format do not change any values for variables in your dataset…it only affects printing in Procs. Class 3
Displaying Results: Proc Format • Example…first use Proc Format to create formats, then use the Format statement in analysis Proc to associate format with specific variable. Proc Format; Value AgeFmt 35-55=’35-55 years’ Other=‘Else’; Run; Proc Freq Data=Temp; Tables Age; Format Age AgeFmt. ;/* Period at end of format associates it with var */ Run; Class 3
Displaying Results: Proc Format • We have seen how Proc Format works in Class 1 and 2 exercises • Make sure you can use it! • I recommend adapting code I’ve given you if you can. Class 3
Differentiating Customers • DBM is all about differences among customers that can be captured in data about them. • DBMers want to use data to segment their customer base so that they can… • Market to them appropriately • Target them for increased profitability • So, DBMers are always looking for differences in response (order, pay, return, not pay) for customers with different characteristics Class 3
So, How Do We Determine if Customers with Different Characteristics Have Different Response Rates? • First step is to conduct a test • Second step is to analyze the data Class 3
From a Data Point-of-View, How Do We Differentiate Customers? • Let’s say we have conducted a test and have merged our test’s “frozen file” of customer characteristics with our response variable (Order). • The marketing team wants to know “how much higher than average do buyers of multiple Healthy Living books respond?” • The frozen file variable NHLBKPAY measures the number of HL books paid. Class 3
From a Data Point-of-View, How Do We Differentiate Customers? • Method 1: Subset data via a Data Step and analyze separate datasets…this works, but is really inefficient. Data Multi NotMulti; /*Creates 2 files */ Set A.TestData; If NHLBKPAY GE 2 then Output Multi; Else Output NotMulti; Proc Means Data=Multi; Var Order; Run; Proc Means Data=NotMulti; Var Order; Run; Proc Means Data=A.TestData; Var Order; Run; Class 3
From a Data Point-of-View, How Do We Differentiate Customers? • Method 2: Use Proc Freq • Cross characteristic with response variable • Need Proc Format for NHLBKPAY because it can have many values. • Proc Format Value Freq Low-1 = ‘NotMulti’ 2-High=‘Multi’; Run; • Proc Freq Data=A.TestData; Format NHLBKPAY Freq.; Tables NHLBKPAY * Order; Run; Class 3
From a Data Point-of-View, How Do We Differentiate Customers? • Method 3: Use Where statement with Proc Means Proc Means Data=A.TestData; Where NHLBKPay GE 2; Var Order; /* Order rate for Multis */ Run; Proc Means Data=A.TestData; Where NHLBKPay LE 1; Var Order; /*Order rate for non-Multis*/ Run; Proc Means Data=A.TestData; Var Order; /*Order rate for all names, the average rate*/ Run; Class 3
From a Data Point-of-View, How Do We Differentiate Customers? • Use Proc Tabulate to create tables • It’s complicated…but very powerful…so… • For simple tables, use Proc Freq and Proc Means • For more complicated tables, use Proc Tabulate Class 3
More Complicated Tables • Let’s say we want order rates for cells created by crossing one customer characteristic with another…a complicated problem. • For example, calculate order rate for all 6 combinations of NHLBKBUY (in 3 categories) crossed with TLHLPAY (in 2 categories) • A 3 * 2 tables has three rows and two columns for a total of 6 cells or combinations…will also be row results and column results. • Draw a picture of such a table on paper! Class 3
From a Data Point-of-View, How Do We Differentiate Customers? • What do we need to calculate order rates for names with a given characteristic?…or combination of characteristics? • Number orderers with characteristic (numerator) • Number all names with characteristic (denominator) • So, for a 3*2 table how can we get the numbers we need? Class 3
More Complicated Tables: Use Proc Freq with Where Statement • Calculate numbers of orderers for each combination of characteristics in the 3 * 2 table • Proc Format Value Freq Low-0 = ‘Zero’ 1= ‘One’ 2-High=‘2 or More’; Value Recency Low-12=’12 Mo Recent’ 13-High=‘GT 12 Mo Rec’; Run; • Proc Freq Data=Temp; Tables NHLBKBUY * TLHLPAY; Where Order=1; /* 3*2 table for orderers in sample */ Format NHLBKBUY Freq. TLHLPAY Recency. ; Run; Class 3
More Complicated Tables: Use Proc Freq with Where Statement • Calculate total number of namesfor one combination of characteristics in the 3 * 2 table Proc Format Value Freq Low-0 = ‘Zero’ 1= ‘One’ 2-High=‘2 or More’; Value Recency Low-12=’12 Mo Recent’ 13-High=‘GT 12 Mo Rec’; Run; • Proc Freq Data=Temp; Tables NHLBKBUY * TLHLPAY; /*3*2 table for names in sample */ Format NHLBKBUY Freq. TLHLPAY Recency. ; Run; Class 3
More Complicated Tables: Use Proc Freq with Where Statement • Calculate order rates and indices by hand. • Have 6 numerators • Have 6 denominators Class 3
More Complicated Tables: Use Proc Means with Where Statement • Mean of 0-1 coded order variable is order rate as a proportion. • Problem: You cannot use Proc Format with Proc Means. • But, Where statement does work for ranges of variables • Proc Means Data=Temp; Where NHLBKBUY LE 0 and TLHLPAY LE 12; Var Order; Run; • Proc Means Data=Temp; Where NHLBKBUY EQ 1 and TLHLPAY LE 12; Var Order; Run; • Need to make 4 more runs…but gets job done Class 3
More Complicated Tables:Use Proc Tabulate • Need Proc Format to categorize variables Proc Tabulate Data=Temp; Class NHLBKBUY TLHLPAY ; Var Order; Table NHLBKBUY * (N PCTN MEAN*ORDER), TLHLPAY; Format NHLBKBUY Freq. TLHLPAY Recency. ; Run; • On paper, draw what this table would look like • Save the code for any Proc Tabulate that works…it’s a difficult Proc to use and adapting code is easier than re-writing it. Class 3
Review Homework1 Class 3
Class Exercise • Summarizing Data with Proc Tabulate • Outputting Proc Tabulate table to Excel Class 3
Notes on Missing Data…a Quick Primer Review This Section on Your Own • How SAS represents missing data… • Numeric data => with a decimal point (.) • Character data => with blanks => ’ ‘ • In real world data, almost all data have missing values, e.g…. • In a SAS file with 15,000 names, the mean of VarA is based on 14,001 names • But, mean of VarB based on 832 names • Can’t compare means (or do correlations) because names may be different Class 3
Notes on Missing Data • When you have missing data… • Need to determine you do not have data problems…never assume data are OK! • May need to create special flags to encode missing data for modeling and profiling, especially if names with missing data respond at different rate than names with data. • In example above… • 1 means female • 0 means not female (i.e., male or missing) Class 3
Notes on Missing Data • Another example • 1 mean female • 2 means male • Missing Recoded to 3 where 3 means Don’t Know GenderNew = 0; If Gender = ‘ ‘ then GenderNew = 3; If Gender = ‘F’ then GenderNew = 1; If Gender = ‘M’ then GenderNew = 2; • Note: In this example the variable Gender is character…the variable GenderNew is numeric. • There should no names with a value of zero! Class 3
Notes on Missing Data • The Bad News • Treatment of missing data is usually complicated • Variables with much missing data not very useful • In new business, transaction & promotion variables mostly missing…that is, they have little coverage • The Good News • Only our Healthy Living (HL) data have missing values • So, for class don’t worry about it • In real world, worry about it • If you want to work with missing data, practice with the HL data…in these data, missing values and other unique values of variables are encoded as negative integer numbers Class 3