Target ’ s Pregnancy Prediction Problem The Complete Analytical Process

Target’s Pregnancy Prediction ProblemThe Complete Analytical Process “Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August.”

The Model

Model Inputs Based on Duhigg’s description of Jenny Ward, what kinds of data did Target analysts use in constructing the pregnancy prediction model?

DATA OUTPUT ACTION Pregnancy Scores Product Selection in Brochures Past Purchases - Related items - Target items Age Gender

Discuss An analyst wants to improve the model by adding more variables to it. Suggest some additional variables.

Discuss Suggest other actions that can be informed by the predicted pregnancy scores.

Model DATA OUTPUT ACTION Pregnancy Scores Product Selection in Brochures Past Purchases - Related items - Target items Age Gender

What is a Model? Valuation Model Source: Keith Howe (2009)

What is a Model? Climate Model Source: Mark Chandler, EdGCM

What is a Model? Climate Model Source: IPCC

What is a Model? Digital Marketing Attribution Model

Digital Marketing Attribution For each response, allocate credit to the responsible channel Display Ad SEO SEM Email

FIRST CLICK EXP DECAY LAST CLICK Response Display Ad SEO SEM Email t0 t10 t9 t8 t7 t6 t5 t4 t3 t2 t1 User clicked on Google organic search result User clicked on banner ad Influence scales with time order

A model is an abstraction,a simplified view of reality

All models are wrongbut some are useful George Box

First to Your Door “Right around the birth of a child... parents are exhausted and overwhelmed and their shopping patterns and brand loyalties are up for grabs.” “We knew that if we could identify them in their second trimester, there’s a good chance we could capture them for years.”

Brochure Design “As long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons.” “We’d put an ad for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.” “We started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random.”

Customer acquisition tool • It’s predictive • So accurate it’s creepy

The Marketing Problem Better Prospecting More Relevant Brochures More New Customers More Revenue Per Current Customer Lower Cost More Revenue More Profit

Discuss Identify any problems with the way Target analysts framed the business problem.

Hint: How is this model scored?

The Business Problem Revisited Targeting of At-Risk Customers More Relevant Brochures Retain More Customers More Revenue Per Customer Cost: New Brochures v. Fewer Brochures + Any Offers More Revenue More Profit

Measuring Success According to Duhigg, Target’s pregnancy prediction effort was highly successful. How accurate was Target’s prediction model? The accuracy rate was not disclosed.

10% of targets are pregnant

20% predicted to be pregnant 3 of 10 predictions are accurate 2 of 5 pregnancies are missed

Predictive Lift

20% predicted to be pregnant Accurate Prediction Missed Opp False Positives Why did Target mix in random products? A) 7 out of 10 receiving brochures will not be pregnant B) 3 out of 10 receiving brochures will feel creeped out A >> B 27

Customer acquisition tool • It’s predictive • So accurate it’s creepy • Inaccurate anddetracting • Not for customer acquisition 2. It’s not very predictive - even with Big Data

Target’s Pregnancy Prediction Problem • Defining and framing the business problem • Collecting data for the analytical model • Selecting an analytical method • Developing a useful model that solves the problem • Describing how model outputs can drive action • Projecting the impact of such action • Measuring the model performance

Complete Analytical Process

The Baby Names VoyagerImportance of Proper Framing

Use Cases Which of the following questions can be answered directly by the Baby Names Voyager (without referring to other materials)? A. Why did the name Barbara peak in the 1940s? B. Is the name Charlotte or Chelsea more popular? C. How popular with David be in the year 2025? D. What name should I choose for my baby girl?

Other Analyses of the Data Source: Social Security Administration

Other Analyses of the Data

Given a name, which time period is most likely? Given a name, guess someone’s age Given a name, guess what languages he/she speaks fivethirtyeight.com Inverting the Frame Given a time period, which names are popular? Baby Names Voyager

DATA OUTPUT ACTION Address Religion First Name Last Name Probabilities of speaking English, Spanish, German, Japanese, etc. Segmentation, Targeting, etc.

Evaluating Model Performance Make prediction using the median Use IQR as a measure of error Accuracy varies with name Source: fivethirtyeight.com

Evaluating Model Performance Accuracy varies with gender Accuracy improves with more co-variates Source: fivethirtyeight.com 43

Discuss What other co-variates might be useful to help predict age more accurately?

Complete Analytical Process

Course Project I. Project Proposal (Wk 3) II. Midterm: Data Cleaning & Processing (Wk 7) III. Final: Analysis & Modeling (Wk 12)

Project Proposal • Objectives: • Select a dataset and specify a business/organizational problem you want to solve • Diagnose data issues in your dataset (you will fix these issues in Deliverable #2). We cover diagnosing and fixing data issues in Module 2 • Due Date: [Sep 25th], 11:59 PM • Grading: max 10 points • All assignment files must be uploaded to Canvas. We do not accept emailed files. • Reminder: Late assignments (excused or not) will incur a penalty of 20%. Late without prior notification, or late by more than 7 days, will be scored zero. • Ling or I will provide feedback and approval on Canvas. (Please open your documents before you email us asking where our comments are.)

Choosing your Dataset • Not too small (e.g. > 500 rows) • Not too big (e.g. < 1 million) • Not too aggregated • Not too dirty • Not too clean • Non-anticipatory (if Prediction)

Example of a Bad Dataset Ebola in West Africa data Too aggregated For any given business problem, many of these rows will be useless Too few variables

Selecting an Analytical Problem PREDICTION SEGMENTATION • Probability of a borrower defaulting a loan • Probability of an email being spam • Probability of a customer deactivating (“churn”) • Amount of revenues • Frequency of visits • There is a response (outcome) variable • If the response is binary (yes/no) or categorical (e.g. which product type), also called a “classification” problem • Looking for correlations between the response and co-variates • Predictions can be validated • How many types of customers do we have? • What are the characteristics of different types of shoppers? • What is the probability that a company has a business model of type A (B, C, etc.)? (advanced) • No response (outcome) variable • Adding structure to the data • Looking for correlations between co-variates • Difficult to validate, need external evidence such as survey results

Target ’ s Pregnancy Prediction Problem The Complete Analytical Process

Target ’ s Pregnancy Prediction Problem The Complete Analytical Process

Presentation Transcript

A Course on Analytical Thinking

Analytical Hierarchy Process ( AHP )

The Diagnosis of Pregnancy

SUBSTANCE ABUSE SCREENING IN PREGNANCY

Choices in IA: Problem definition, decision criteria, analytical methods

General Analytical Model of Decision Making

An Introduction to Programming with C++ Fifth Edition

Gemini Skills Workshop

How to Measure ANYTHING

Process Analytical Technology: What you need to know

Software Metrics and Defect Prediction

Teen Pregnancy… So what?

Analytical writing

Improving Intergenic miRNA Target Genes Prediction

Target setting to target getting

Health Pregnancy

Address-Value Delta (AVD) Prediction

Linear Prediction

5. Microarchitecture of Superscalars (3) Branch Prediction

OBJECTIVES

Appendicitis in pregnancy

Medical Disease in Pregnancy