“Hard Talks”: Talking Bayes to Business

“Hard Talks”: Talking Bayes to Business A marketing spend use-case Yizhar (Izzy) Toren

About me • Adopted R at 2003 & introduced it to my university • Bayesian by belief - Frequentist by practice • I call myself a “Data Scientist” because I know math, stats & just enough programming to be dangerous • Currently focused on forecasting & causality (for elasticity, optimisation, etc.) and NLP for recommendations & search Find me on Twitter / Linkedin / Github

Agenda • Motivation: Is my ad campaign working? • Why Bayes? • Use Case: Measuring impact without a test • Toolkits: CausalImpact • A workflow example

Meet Nadia Nadia is a marketing director. Nadia is smart. She wants to know if a new video ad campaign will be effective. She talks to you about impact, tracking & KPIs before planning the campaign. BE LIKE NADIA 🙋

Meet Nadia Nadia is a marketing director. Nadia is smart responsible. She wants to know if a new video ad campaign will be effective. She talks to you about impact, tracking & KPIsbeforeplanning releasing the campaign. BE LIKE NADIA, but be better next time 💁

Meet Nadia Nadia is a marketing director. Nadia is smartresponsible. She wants to know if a new video ad campaign will be effective. She talks to you about impact, tracking & KPIs before planning after releasing the campaign. BE LIKE NADIA, but be better next time 🙎

In a perfect the real world • We have a model of population & causality(e.g. more ads ➡ more signups) • We have well defined KPIs (daily signups) and understanding of effect size • Sufficient volume for significance & power • Sufficient velocity for timely answer • Good randomisation & user tracking infra for A/B tests 💁 ⚠ harder than you’d think⚠

Is it working? Good news! We pass the IOTT (Intra-Ocular Trauma Test) Test group before after 95% CI: [102.2,130.9]P-value < 2e-15

Is it still working? Life is noisy and complicated ➡ Let’s test! • Nadia: “Can we say the ad campaign worked?” • You: “Well… we saw X% increase daily visits, with p < 0.005” • Nadia: “So… 99.5% its working?” • You: “ , also not necessarily” And without a testing? 😳 Test group

So Why Bayes? • Get the answers you want (p-values = hard talk) • A healthy conversation with stakeholders (priors) • Problems first not solutions backwards • Sometimes you just can’t test • Because you have nifty tools in* * And some of them only in R, so you now have a great excuse to introduce R into your org toolset!

Use Case • On 09/12 we implemented a new ad-spend policy on a content website (“ThyPipe”), where we can’t run tests • Nadia asks: “Is the new policy better than the old one?” • Translation: what would have happened if we did not change anything? (a.k.a “The Counterfactual”)

Data • User signups from different sources (ours is source1) • Queries from a search engine (rhymes with “Doodle”) • Calendar: Holidays, seasonality, feature releases, etc.

Possible Approaches • A/B testing • Compare “Before” / “After” ❌ • Difference in differences 🔎🦄 • Multivariate regression + 🔎📦time series + GLM + … • Bayesian Time Series ✔We “got lucky”: CausalImpact 🎯

CausalImpact Before After Step 1: Fit the best model you can on “before” data (out-of-the box or customized via BSTS) Step 2: Compare actual “after” data to the model simulation of the same period: This is the effect! “We see 4% increase but it’s probably noise: 45% chance we’re below the min 3% uplift required”

Work in Progress! Workflow example Actual DB signals (historical data, regressors) Calendar Simulations Git signals Model (BSTS) Manual Signals CausalImpact

Summary • A/B testing is great, when testing is feasible andthe answers are meaningful • If you can’t test - simulate! • Think problem first, not solution backwards • Priors are an opportunity to engage with stakeholders • Use powerful tools, but with care 🕸 • More details on Github and Youtube

Questions?

Thank you! We’re Hiring! Find me on Twitter / Linkedin / Github

Appendix

The answers you want The answer Nadia wants P(“it works”) P(data|“it works”)P(data) Prior Likelihood P(“it works”|data) = Might be Hard to Compute p-value = P(data|”it’s not working”) You are already having a “hard talk”...

Priors: The “Hard Talk” • Choice of priors: subjective 🙀, but there are guidelines • New discussions with stakeholders: • Internally: “if you had to guess”, surveys, games, ... • Externally: industry benchmarks • Some obvious defaults (mean=0, “natural” limits, ...) • Defaults from your tools (when in doubt - ) Your new job: Translate business insights into a distribution

Thinking & Framing Frequentist: “Solution Backwards” Bayesian: “Problem First” • Frequentist: phrase the problem to fit the tools • Bayes: find a model that fits the problem (but in a finite time…) ProblemScope ToolScope Problem Scope ToolsScope Time toSolve Time toSolve Solutions Solutions

Some More Toolkits • Stan • Fully flexible & powerful • New syntax • Cross platform • Multiple R wrappers (BRMS, stanARM, …) • Prophet: • Stan wrapper • R & python bindings • (log) additive Gaussian only • Time series trend oriented • BSTS: • R library (with R syntax) • (log) additive Gaussian / Poisson(sometimes) • Regression/causality oriented • CausalImpact is a wrapper Flexible BSTS Easy Hard Specific

“Hard Talks”: Talking Bayes to Business