390 likes | 517 Views
Lecture 8. MARK2039 Summer 2006 George Brown College Wednesday 9-12. Assignment 6. Backend: H4B2E5STRUGER Marketing list: H4B2E5STRUGERJOHN4849MAYFAIR Unaddressed Campaign: H4B2E5. Assignment 6. Id Total Amount # of months since last trans.
E N D
Lecture 8 MARK2039 Summer 2006 George Brown College Wednesday 9-12
Assignment 6 Backend: H4B2E5STRUGER Marketing list: H4B2E5STRUGERJOHN4849MAYFAIR Unaddressed Campaign: H4B2E5
Assignment 6 Id Total Amount # of months since last trans. 456 1280 6 months 123 300 5 months789 76 8 months12 10 10 months
Assignment 6 Data needs to be standardized such that we have one value for each gender outcome
Assignment 6 Use purchase behaviour field and look at purchase window(say 3 mos.)(April06 to June06). No purchase in window means customer is non defector(0) while purchase in window means customer is defector(1). I would use the other information(income,region,age, and tenure) as potential variablesto help predict defection.
Classification/Profiling vs. Predictive Modelling Profiling PredictiveModelling - - Pre Defector Post Non Defector Age - Defector - Age - Age - Tenure - Non - Defector - Tenure - Tenure - Income - Income - Income -Transaction TransactionBehaviour - Transactionbehaviour -Transaction Behaviour Independant Dependant variables variable Classification Predict
Predictive Modelling • Examples:Discrete Models • Response Models • Cross Sell • Upsell • Acquisition • Attrition Models • Product Affinity Models • Risk Models
Predictive Modelling • Examples-Continuous Models • Profitability/Value Models • Spending Models
Types of Predictive Models - • An acquisition campaign with no targetting was conducted in January. The available information is as follows: • Mail files containing name and address • Responder files containing name and address • 2001 Stats Can Census data available at the enumeration area • A conversion table which maps enumeration areas to postal codes • How would you use the above information to better target prospects to become new customers. • Describe how the analytical file would be created • 1) define objective function of creating response variable • 2)create response variable by matching responder file to mail file using match key of postal code and last name. Assign value of 1 for matches(responders) and 0 for non matches(non responders). This field will be created on mail file or analytical file • 3)Match analytical file to Stats conversion file(contains enumeration area) by postal code. Match new output file to Stats Can file by enumeration area which contains the very rich demographic information. • Remember the end deliverable is to create a table with the dependant variable or objective function and examples of other independent or predictor variables.
1) define objective function of Types of Predictive Models • You have been asked to create programs that better target existing customers for insurance products. You have the following info: What would you do and how would you create the analytical file 1) Define objective function and create insurance response variable 2)create insurance response variable by looking at amount spent in certain transaction type and within a certain timeframe. Assign value of 1 if this condition is met and 0 if not.. This field will be created on analytical file 3)Create independent model predictors by creating recency,freq uency, and amount variables and by type from the transaction file. Create demographic variables from the customer file such as region of country, tenure, age, income,etc. Remember the end deliverable is to create a table with the dependant variable or objective function and examples of other independent or predictor variables.
Types of Predictive Models • You have been asked to build a targetting tool for a cross-sell campaign to get existing customers to purchase an insurance policy A campaign was conducted in May of 2005. What questions do you need to ask in order to help design a proper tool • Was the campaign data captured. Are responders clearly identified or do we have to impute them through the database based on the transaction data that occurred within a certain time frame of the campaign.
Types of Predictive Models • You have been asked to target customer that will not only purchase insurance but will also purchase the largest premiums • What type of model would be built here? • Two-stage model with one whereby we are targetting both insurance response and premium. Objective function is Expected value of premium: Pr(Response) X Premium
Types of Predictive Models • Creating The Analytical File • Defining the objective function • Defining the Model predictors • Once this is done, the first diagnostic that can be done is the correlation matrix.
Correlation • Want to determine which variables have the greatest relationship with response • Run the correlation of the dependant variable with all the independents (in your reduced set). • Based on the highest correlation coefficient select best variables (usually select those with statistical significance criterion of at least 95%) • Correlation can be negative or positive • Serves as a great pre-screening tool.
The Concept of Correlation • Using correlation analysis for selecting variables for our response model. • Analytical file contains six variables: Dependant Variable/ Modelled Variable Response • Age • Tenure • # of Products • # of Promotions • Income • Household Size Independent Variables • The key diagnostics in this routine are: • Correlation coefficient • Confidence level
Correlation Analysis • The male gender variable has a perfect correlation of +1. • The female gender variable has a perfect correlation of -1. • Household size has no correlation with response, hence the correlation coefficient is 0.
Correlation Results • Show the level of confidence which a given variable has with the modelled behaviour i.e. response Correlation coefficient Confidence Interval
Correlation • Why couldn’t we just use results of correlation to create model and create index values for each sign .variable. • Age • Tenure • # of products purchased • # of promotions since last purchase Because there is interaction between variables that need to be accounted for in modelling exercise(multicollinearity). You canreview this concept in more detail in any introductory stats textbook.
Examples-Correlation-Response Model • Listed below is an example of a correlation matrix • Answer the following: • Is each variable relevant • -all with exception of live in Quebec, # in household and # of months since last purchase • What is the relationship or impact of each variable with response • -sign of variable tells you relationship where corr. Coeff. tells you impact • What is the strongest variable and what is the weakest variable? • Strongest var: # of months since last promoted. Weakest var: live in Quebec
More examples of correlation • -Younger people are more likely to respond -Higher income are more likely to respond -Males are less likely to respond Would the correlation values against response for the above variables be highly positive,close to zero or negative for age,income, and femalesage: highly negative Income: highly positive Females: highly positive • People who live in Quebec exhibit no impact on response, people with high tenure and high number of months since last promotion are less likely to respond. Would the correlation values against response for the each variable be highly positive,close to zero or negative • Quebec: close to zero • tenure: highly negative • Number of months since last promotion:highly negative
More examples of correlation • Previous analysis has indicated the following trends • Would the correlations be closer to 1,-1 , or0 here for bothvariables? Spending: close to 0. tenure: close to -1
More examples of correlation • Would the correlations be closer to 1,-1 , or0 here for bothvariables? Spending: close to 1 tenure: close to 0 • What is the learning here vs. the previousslide-variables have changed in their impact to response
Exploratory Data Analysis Reports(EDA) • After looking at the correlation reports, we also need to create EDA reports which help to better understand the relationship of a given variable with the desired marketing behaviour. • It helps the business people and marketers to get inside the so-called black box of modelling.
Exploratory Data Analysis Reports(EDA) • Let’s take a look at example of a binary variable Male # of Observations Response Rate Yes 50000 2.00% No 50000 2.60% Average 100000 2.30% On the next page are some examples of EDA reports of variables that are not statistically significant according to the correlation matrix.
Exploratory Data Analysis Reports(EDA) • EDA’s of non-stat.sign. variables
Exploratory Data Analysis Reports • Exploratory Data Analysis Reports: What does this tell us? What does this tell us?
Exploratory Data Analysis Reports What does this mean? What does this mean?
Creating the Final Model • Why couldn’t we just use results of correlation to create model and create index values for each sign .variable. • Age • Tenure • # of products purchased • # of promotions since last purchase Think Statistics here?
The Data Mining Process : Application of Data Mining Techniques-Creating the Final Model Problems with Multicollinearity • Example: Years of Education and Income on Response Rate • Regression Equation is: Response= .50+.00001*income -.03*yrs. of education Problems with Multicollinearity • Example: Years of Education and Income on Response Rate • Regression Equation is: Response= .50+.00001*income -.03*yrs. of education Response Years of Income Education Correlation Coefficient 0.11 0.12 Confidence Interval 99% 99.50% What is the problem here and what do you do?
Continuing to build the model • Multivariate analytical techniques such as multiple regression,logistic regression,etc. may be employed to produce the final model • Final equation:Predicted Response Rate:=A –B1*Age +B2*tenure • What is the problem here?
Variable Correlation Spend 0.6 Live in Ontario 0.5 Number in House -0.3 Response= A (+.05 X spend) (-.03 X Live in Ontario) (-.01 X Number in House) Variable Correlation # of products 0.6 Credit Score 0.4 Tenure -0.2 Response= A (-.03*number of products) (+.08 X Credit Score) (-.01 X tenure) Continuing to build the model
Continuing to build the model • After observing correlation results and EDA’s what can we begin to do at this point. • Derive new variables-EDA’s • Derive new variables-multicollinearity • Derive new variables-Factor Analysis • Derive new variables-CHAID(will explore later) Reference Material: Factor Analysis-look up in any Statistics Handbook Regression-look up in textbook under Regression and Statistics Regression.
Continuing to build the model • Running further statistical routines, we are able to develop a final model. The marketer or business person should receive a report that looks as follows: For those of you that have statistics training, how is the % Contribution to model calculated derived?
Continuing to Build the Model Variable Partial Model Entered R-Square R-Square var 4 0.0036 0.0036 var 3 0.0034 0.007 var 1 0.0016 0.0086 var 2 0.0007 0.0092 var 6 0.0009 0.0102 var 5 0.0003 0.0105
Continuing to Build the Model What would be the final equation in terms of the sign?
Continuing to build the model • What would you do here
Continuing to build the model • Suppose we have the following equation: • Response= +.09 • +.05 X Income • +.06 X Tenure • +.08 X Product Spend • -.04 X Male • What is the problem here?