1 / 52

Target ’ s Pregnancy Prediction Problem The Complete Analytical Process

Explore how Target's analysts constructed a pregnancy prediction model, the data inputs used, additional variables to improve the model, and the actions informed by predicted pregnancy scores.

ncervantes
Download Presentation

Target ’ s Pregnancy Prediction Problem The Complete Analytical Process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Target’s Pregnancy Prediction ProblemThe Complete Analytical Process “Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August.”

  2. The Model

  3. Model Inputs Based on Duhigg’s description of Jenny Ward, what kinds of data did Target analysts use in constructing the pregnancy prediction model?

  4. DATA OUTPUT ACTION Pregnancy Scores Product Selection in Brochures Past Purchases - Related items - Target items Age Gender

  5. Discuss An analyst wants to improve the model by adding more variables to it. Suggest some additional variables.

  6. Discuss Suggest other actions that can be informed by the predicted pregnancy scores.

  7. Model DATA OUTPUT ACTION Pregnancy Scores Product Selection in Brochures Past Purchases - Related items - Target items Age Gender

  8. What is a Model? Valuation Model Source: Keith Howe (2009)

  9. What is a Model? Climate Model Source: Mark Chandler, EdGCM

  10. What is a Model? Climate Model Source: IPCC

  11. What is a Model? Digital Marketing Attribution Model

  12. Digital Marketing Attribution For each response, allocate credit to the responsible channel Display Ad SEO SEM Email

  13. FIRST CLICK EXP DECAY LAST CLICK Response Display Ad SEO SEM Email t0 t10 t9 t8 t7 t6 t5 t4 t3 t2 t1 User clicked on Google organic search result User clicked on banner ad Influence scales with time order

  14. A model is an abstraction,a simplified view of reality

  15. All models are wrongbut some are useful George Box

  16. First to Your Door “Right around the birth of a child... parents are exhausted and overwhelmed and their shopping patterns and brand loyalties are up for grabs.” “We knew that if we could identify them in their second trimester, there’s a good chance we could capture them for years.”

  17. Brochure Design “As long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons.” “We’d put an ad for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.” “We started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random.”

  18. Customer acquisition tool • It’s predictive • So accurate it’s creepy

  19. The Marketing Problem Better Prospecting More Relevant Brochures More New Customers More Revenue Per Current Customer Lower Cost More Revenue More Profit

  20. Discuss Identify any problems with the way Target analysts framed the business problem.

  21. Hint: How is this model scored?

  22. The Business Problem Revisited Targeting of At-Risk Customers More Relevant Brochures Retain More Customers More Revenue Per Customer Cost: New Brochures v. Fewer Brochures + Any Offers More Revenue More Profit

  23. Measuring Success According to Duhigg, Target’s pregnancy prediction effort was highly successful. How accurate was Target’s prediction model? The accuracy rate was not disclosed.

  24. 10% of targets are pregnant

  25. 20% predicted to be pregnant 3 of 10 predictions are accurate 2 of 5 pregnancies are missed

  26. Predictive Lift

  27. 20% predicted to be pregnant Accurate Prediction Missed Opp False Positives Why did Target mix in random products? A) 7 out of 10 receiving brochures will not be pregnant B) 3 out of 10 receiving brochures will feel creeped out A >> B 27

  28. Customer acquisition tool • It’s predictive • So accurate it’s creepy • Inaccurate anddetracting • Not for customer acquisition 2. It’s not very predictive - even with Big Data

  29. Target’s Pregnancy Prediction Problem • Defining and framing the business problem • Collecting data for the analytical model • Selecting an analytical method • Developing a useful model that solves the problem • Describing how model outputs can drive action • Projecting the impact of such action • Measuring the model performance

  30. Complete Analytical Process

  31. The Baby Names VoyagerImportance of Proper Framing

  32. Use Cases Which of the following questions can be answered directly by the Baby Names Voyager (without referring to other materials)? A. Why did the name Barbara peak in the 1940s? B. Is the name Charlotte or Chelsea more popular? C. How popular with David be in the year 2025? D. What name should I choose for my baby girl?

  33. Other Analyses of the Data Source: Social Security Administration

  34. Other Analyses of the Data Source: Social Security Administration

  35. Other Analyses of the Data Source: Social Security Administration

  36. Other Analyses of the Data

  37. Other Analyses of the Data

  38. Other Analyses of the Data

  39. Other Analyses of the Data

  40. Given a name, which time period is most likely? Given a name, guess someone’s age Given a name, guess what languages he/she speaks fivethirtyeight.com Inverting the Frame Given a time period, which names are popular? Baby Names Voyager

  41. DATA OUTPUT ACTION Address Religion First Name Last Name Probabilities of speaking English, Spanish, German, Japanese, etc. Segmentation, Targeting, etc.

  42. Evaluating Model Performance Make prediction using the median Use IQR as a measure of error Accuracy varies with name Source: fivethirtyeight.com

  43. Evaluating Model Performance Accuracy varies with gender Accuracy improves with more co-variates Source: fivethirtyeight.com 43

  44. Discuss What other co-variates might be useful to help predict age more accurately?

  45. Complete Analytical Process

  46. Course Project I. Project Proposal (Wk 3) II. Midterm: Data Cleaning & Processing (Wk 7) III. Final: Analysis & Modeling (Wk 12)

  47. Project Proposal • Objectives: • Select a dataset and specify a business/organizational problem you want to solve • Diagnose data issues in your dataset (you will fix these issues in Deliverable #2). We cover diagnosing and fixing data issues in Module 2 • Due Date: [Sep 25th], 11:59 PM • Grading: max 10 points • All assignment files must be uploaded to Canvas. We do not accept emailed files. • Reminder: Late assignments (excused or not) will incur a penalty of 20%. Late without prior notification, or late by more than 7 days, will be scored zero. • Ling or I will provide feedback and approval on Canvas. (Please open your documents before you email us asking where our comments are.)

  48. Choosing your Dataset • Not too small (e.g. > 500 rows) • Not too big (e.g. < 1 million) • Not too aggregated • Not too dirty • Not too clean • Non-anticipatory (if Prediction)

  49. Example of a Bad Dataset Ebola in West Africa data Too aggregated For any given business problem, many of these rows will be useless Too few variables

  50. Selecting an Analytical Problem PREDICTION SEGMENTATION • Probability of a borrower defaulting a loan • Probability of an email being spam • Probability of a customer deactivating (“churn”) • Amount of revenues • Frequency of visits • There is a response (outcome) variable • If the response is binary (yes/no) or categorical (e.g. which product type), also called a “classification” problem • Looking for correlations between the response and co-variates • Predictions can be validated • How many types of customers do we have? • What are the characteristics of different types of shoppers? • What is the probability that a company has a business model of type A (B, C, etc.)? (advanced) • No response (outcome) variable • Adding structure to the data • Looking for correlations between co-variates • Difficult to validate, need external evidence such as survey results

More Related