330 likes | 369 Views
Network A/B Testing: From Sampling to Estimation. Ya Xu ‡ Joint work with Huan Gui † Anmol Bhasin ‡ Jiawei Han † † University of Illinois at Urbana-Champaign, Urbana ‡ LinkedIn Corporation. introduction. A/B Testing. Uniformly Random Control Treatment Average Treatment Effect.
E N D
Network A/B Testing: From Sampling to Estimation YaXu‡ Joint work with HuanGui†AnmolBhasin‡Jiawei Han† † University of Illinois at Urbana-Champaign, Urbana ‡LinkedIn Corporation
A/B Testing • Uniformly Random • Control • Treatment • Average Treatment Effect
Assumption A/B Testing – Two parallel universes • Two parallel universes Parallel Universe 1 (control, ) Parallel Universe 2 (treatment, ) Real World (Observations, )
Network A/B Testing Interactions between nodes in networks
Examples • Experiment on feed ranking algorithms • Treatment feed algorithm ranks more relevant items higher • Adam(treatment) clicks on a feed update(X) • X shows up higher for Adam’s friend Ben (control) • Ben (control) clicks on X • Experiment on People You May Know recommendations • …
Assumption: SUTVA • SUTVA (Stable Unit Treatment Value Assumption) • Treatment Assignment Vector • Response function • Each individual’s response is affected only by their own treatment assignments.
Framework • Experimental Design • Randomize assignment to minimize interactions • Experimental Analysis • Adjust for network effect post experiment
Experimental Design • Partition the network/graph • Randomize atcluster level Minimize the links between clusters Minimize the interactions between treatment and control • Minimize information leakage • Smaller bias for ATE
Balanced Graph Partition • If the cluster sizes are the same for all clusters • No matter what users’ responses are, the covariance is zero, leading to non-biased estimator. See Middleton and Aronow 2011 for derivation
Clustering Real Network Heterogeneous & large scale (350MM+) An employee network from LinkedIn 3-net clustering (Ugander et. al.,KDD’13)
Randomized Balanced Graph Partition • Random Shuffling on Label Propagation • Randomly initialize clusters (equal size) • Select two nodes and swap their labels if it results in fewer edges between clusters. • Randomly Shuffle x% of labels • Repeat until convergence. Break local optimal
Clustering Results • Network Statistics • Edges # within each clusters RSLP can be easily distributed as Label Propagation Algorithm, while achieves comparable performance as Modularity Maximization.
Experimental Analysis • Exposure Models • SUTVA • Neighborhood Exposure (Ugander et. al., KDD’13) • Definition: i is neighborhood exposed to treatment if (1) i is in treatment, and (2) At least θ% of i’s neighbors are in treatment • Assumption: i’s response under neighborhood exposure is the same as if everyone receives treatment.
Bias-Variance Tradeoff About 80% of data points would be invalid (high variance) θ= 0.9 • Stronger assumption • Yi(θ=0.3) = Yi(θ= 1) • (large bias) θ= 0.3
Fraction Neighborhood Exposure • Users’ responses are determined by • the treatment assignment • the fraction of neighbors having the same treatment assignment. E.g., Additive Models can be arbitrary function
Example • Additive Model I • ATE
Example • Additive Model II • ATE
Simulations • Real network graph • Generation model (Eckles et al. 2014) • Compare bias & variance of five estimators
Increasing treatment% Increasing treatment% Bias Variance
Increasing Network Effect Increasing Network Effect Bias Variance
Real Online Experiment • Select a country • Apply randomized balanced graph partitioning to assign treatment/control • Apply two Feed ranking algorithms to treatment/control • Estimate ATE using various approaches
Real Online Experiment • Picked Netherlands • 600 clusters 300/300 in treatment/control • Conducted A/A test to ensure no bias
Real Online Experiments Results
Key Takeaways • Network effect in A/B Testing • Experimental Design: Balanced Graph Partition • Experimental Analysis: Fraction Neighborhood Exposure Model • Experiments • Simulation • Real Online Experiments • Lots of future work!
Percentage of Units in Treatment • The distribution of changes with percentage of units in treatment. is not representative.
Graph Cluster Randomization (Ugander et. al., KDD’13) • Partition the social network • How to cluster? Any constraints? • Randomization on the cluster level • Users in the same cluster receive the same treatment assignment (treatment/control). • Estimate Average Treatment Effect • Any assumptions?