Network A/B Testing: From Sampling to Estimation

Network A/B Testing: From Sampling to Estimation YaXu‡ Joint work with HuanGui†AnmolBhasin‡Jiawei Han† † University of Illinois at Urbana-Champaign, Urbana ‡LinkedIn Corporation

introduction

A/B Testing • Uniformly Random • Control • Treatment • Average Treatment Effect

Assumption A/B Testing – Two parallel universes • Two parallel universes Parallel Universe 1 (control, ) Parallel Universe 2 (treatment, ) Real World (Observations, )

Network A/B Testing Interactions between nodes in networks

Examples • Experiment on feed ranking algorithms • Treatment feed algorithm ranks more relevant items higher • Adam(treatment) clicks on a feed update(X) • X shows up higher for Adam’s friend Ben (control) • Ben (control) clicks on X • Experiment on People You May Know recommendations • …

Assumption: SUTVA • SUTVA (Stable Unit Treatment Value Assumption) • Treatment Assignment Vector • Response function • Each individual’s response is affected only by their own treatment assignments.

Network A/B Testing Framework

Framework • Experimental Design • Randomize assignment to minimize interactions • Experimental Analysis • Adjust for network effect post experiment

Experimental Design • Partition the network/graph • Randomize atcluster level Minimize the links between clusters Minimize the interactions between treatment and control • Minimize information leakage • Smaller bias for ATE

Balanced Graph Partition • If the cluster sizes are the same for all clusters • No matter what users’ responses are, the covariance is zero, leading to non-biased estimator. See Middleton and Aronow 2011 for derivation

Clustering Real Network Heterogeneous & large scale (350MM+) An employee network from LinkedIn 3-net clustering (Ugander et. al.,KDD’13)

Randomized Balanced Graph Partition • Random Shuffling on Label Propagation • Randomly initialize clusters (equal size) • Select two nodes and swap their labels if it results in fewer edges between clusters. • Randomly Shuffle x% of labels • Repeat until convergence. Break local optimal

Clustering Results • Network Statistics • Edges # within each clusters RSLP can be easily distributed as Label Propagation Algorithm, while achieves comparable performance as Modularity Maximization.

Experimental Analysis • Exposure Models • SUTVA • Neighborhood Exposure (Ugander et. al., KDD’13) • Definition: i is neighborhood exposed to treatment if (1) i is in treatment, and (2) At least θ% of i’s neighbors are in treatment • Assumption: i’s response under neighborhood exposure is the same as if everyone receives treatment.

Bias-Variance Tradeoff About 80% of data points would be invalid (high variance) θ= 0.9 • Stronger assumption • Yi(θ=0.3) = Yi(θ= 1) • (large bias) θ= 0.3

Fraction Neighborhood Exposure • Users’ responses are determined by • the treatment assignment • the fraction of neighbors having the same treatment assignment. E.g., Additive Models can be arbitrary function

Example • Additive Model I • ATE

Example • Additive Model II • ATE

Simulations & real experiments

Simulations • Real network graph • Generation model (Eckles et al. 2014) • Compare bias & variance of five estimators

Increasing treatment% Increasing treatment% Bias Variance

Increasing Network Effect Increasing Network Effect Bias Variance

Real Online Experiment • Select a country • Apply randomized balanced graph partitioning to assign treatment/control • Apply two Feed ranking algorithms to treatment/control • Estimate ATE using various approaches

Real Online Experiment • Picked Netherlands • 600 clusters  300/300 in treatment/control • Conducted A/A test to ensure no bias

Real Online Experiments Results

Key Takeaways • Network effect in A/B Testing • Experimental Design: Balanced Graph Partition • Experimental Analysis: Fraction Neighborhood Exposure Model • Experiments • Simulation • Real Online Experiments • Lots of future work!

Percentage of Units in Treatment • The distribution of changes with percentage of units in treatment. is not representative.

Graph Cluster Randomization (Ugander et. al., KDD’13) • Partition the social network • How to cluster? Any constraints? • Randomization on the cluster level • Users in the same cluster receive the same treatment assignment (treatment/control). • Estimate Average Treatment Effect • Any assumptions?

Network A/B Testing: From Sampling to Estimation