550 likes | 1k Views
Multi Armed Bandits. chalpert@meetup.com. Survey. Click Here. Click-through Rate (Clicks / Impressions) 20%. Click Here. Click Here. Click Here. Click-through Rate 20% ?. Click Here. Click Here. AB Test. Randomized Controlled Experiment
E N D
Multi Armed Bandits chalpert@meetup.com
Click-through Rate (Clicks / Impressions) 20% Click Here
Click Here Click Here
Click-through Rate 20% ? Click Here Click Here
AB Test • Randomized Controlled Experiment • Show each button to 50% of users Click-through Rate 20% ? Click Here Click Here
AB Test Timeline Time AB Test AB Test Before Test After Test (show winner) Exploitation Phase (Show Winner) Exploration Phase (Testing)
Click-through Rate 20% ? Click Here Click Here
Click-through Rate 20% 30% Click Here Click Here
10,000 impressions/month • Need 4,000 clicks by EOM • 30% CTR won’t be enough
Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here ABCDEFG... Test Each variant would be assigned with probability 1/N N = # of variants Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here
Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here ABCDEFG... Test Each variant would be assigned with probability 1/N N = # of variants Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here
Need to keep testing (Exploration) Need to minimize regret (Exploitation)
Multi Armed Bandit Balance of Exploitation & Exploration
Bandit Algorithm Balances Exploitation & Exploration Time Bandit Favors Winning Arm Discrete Exploitation & Exploration Phases AB Test Before Test After Test AB Test Continuous Exploitation & Exploration Before Test Multi Armed Bandit
Bandit Algorithm Reduces Risk of Testing AB Test Best arm exploited with probability 1/N • More Arms: Less exploitation Bandit Best arm exploited with determined probability • Reduced exposure to suboptimal arms
Demo Borrowed from Probabilistic Programming & Bayesian Methods for Hackers
Split Test Still sending losers Bandit AB test would have cost 4.3 percentage points Winner Breaks Away!
How it works Epsilon Greedy Algorithm ε = Probability of Exploration ε / N 1 / N Click Here Exploration ε 1 / N ε / N Start of round Click Here Epsilon Greedy with ε = 1 = AB Test 1-ε 1 - ε Exploitation (show best arm) Click Here
Epsilon Greedy Issues • Constant Epsilon: • Initially under exploring • Later over exploring • Better if probability of exploration decreases with sample size (annealing) • No prior knowledge
Some Alternatives • Epsilon-First • Epsilon-Decreasing • Softmax • UCB (UCB1, UCB2) • Bayesian-UCB • Thompson Sampling (Bayesian Bandits)
Bandit Algorithm Comparison Regret:
Thompson Sampling Setup: Assign each arm a Beta distribution with parameters (α,β) (# Success, # Failures) Beta(α,β) Beta(α,β)Beta(α,β) Click Here Click Here Click Here
Thompson Sampling Setup: Initialize priors with ignorant state of Beta(1,1) (Uniform distribution) - Or initialize with an informed prior to aid convergence Beta(1,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here
Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution • 2: Select the arm with largest X • 3: Observe the result of selected arm • 4: Update prior Beta distribution for selected arm Success! 0.7 0.2 X 0.4 Beta(1,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here
Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution • 2: Select the arm with largest X • 3: Observe the result of selected arm • 4: Update prior Beta distribution for selected arm Success! 0.7 0.2 X 0.4 Beta(2,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here
Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution • 2: Select the arm with largest X • 3: Observe the result of selected arm • 4: Update prior Beta distribution for selected arm Failure! 0.4 0.8 X 0.2 Beta(2,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here
Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution • 2: Select the arm with largest X • 3: Observe the result of selected arm • 4: Update prior Beta distribution for selected arm Failure! 0.4 0.8 X 0.2 Beta(2,1) Beta(1,2) Beta(1,1) Click Here Click Here Click Here
Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms
Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms
Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms
Coupon Email 16 Arms Control: Save 50%, start your Meetup Group – 42% Open Rate Winner: Here is a coupon – 53% Open Rate (+26%)
210% Click-through Difference: Best: Looking to start the perfect Meetup for you? We’ll help you find just the right people Start the perfect Meetup for you! We’ll help you find just the right people Worst: Launch your own Meetup in January and save 50% Start the perfect Meetup for you 50% off promotion ends February 1st.
Choose the Right Metric of Success • Success tied to click in last experiment • Sale end & discount messaging had bad results • Perhaps people don’t know that hosting a Meetup costs $$$? • Better to tie success to group creation
More Issues • Email open & click delay • New subject line effect • Problem when testing notifications • Monitor success trends to detect weirdness
Seasonality • Thompson Sampling should naturally adapt to seasonal changes • Learning rate can be added for faster adaptation Winner all other times Click Here Click Here
Bandit or Split Test? AB Test good for: - Biased Tests - Complicated Tests Bandit good for: - Unbiased Tests - Many Variants - Time Restraints - Set It And Forget It
Thanks! chalpert@meetup.com