260 likes | 445 Views
Estimating Rates of Rare Events at Multiple Resolutions. Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian. Estimation in the “tail”. Contextual Advertising Show an ad on a webpage (“impression”) Revenue is generated if a user clicks
E N D
Estimating Rates of Rare Events at Multiple Resolutions Deepak AgarwalAndrei BroderDeepayan ChakrabartiDejan DiklicVanja JosifovskiMayssam Sayyadian
Estimation in the “tail” • Contextual Advertising • Show an ad on a webpage (“impression”) • Revenue is generated if a user clicks • Problem: Estimate the click-through rate (CTR) of an ad on a page • Most (ad, page) pairs have very few impressions, if any, • and even fewer clicks • Severe data sparsity
Estimation in the “tail” • Use an existing, well-understood hierarchy • Categorize ads and webpages to leaves of the hierarchy • CTR estimates of siblings are correlated • The hierarchy allows us to aggregate data • Coarser resolutions • provide reliable estimates for rare events • which then influences estimation at finer resolutions
System overview Retrospective data[URL, ad, isClicked] Crawl URLs a sample of URLs Classify pages and ads Rare event estimation using hierarchy Impute impressions, fix sampling bias
Sampling of webpages • Naïve strategy: sample at random from the set of URLs • Sampling errors in impression volume AND click volume • Instead, we propose: • Crawling all URLs with at least one click, and • a sample of the remaining URLs • Variability is only in impression volume
Ad classes Clicked pool Sampled Non-clicked pool Excess impressions(to be imputed) Page classes Imputation of impression volume #impressions = nij + mij + xij sums to ∑nij + K.∑mij[row constraint] sums toTotal impressions(known) sums to #impressions on ads of this ad class[column constraint]
Imputation of impression volume Level 0 • Region= (page node, ad node) • Region Hierarchy • A cross-product of the page hierarchy and the ad hierarchy Level i Region Page classes Ad classes Page hierarchy Ad hierarchy
Imputation of impression volume Level i Level i+1 sums to [block constraint]
Imputing xij • Iterative Proportional Fitting [Darroch+/1972] • Initialize xij = nij + mij • Iteratively scale xij values to match row/col/block constraint • Ordering of constraints: top-down, then bottom-up, and repeat Level i Level i+1 block Page classes Ad classes
Imputation: Summary • Given • nij (impressions in clicked pool) • mij (impressions in sampled non-clicked pool) • # impressions on ads of each ad class in the ad hierarchy • We get • Estimated impression volumeÑij = nij + mij + xijin each region ij of every level
System overview Retrospective data[page, ad, isclicked] Crawl Pages a sample of pages Classify pages and ads Rare event estimation using hierarchy Impute impressions, fix sampling bias
Rare rate modeling • Freeman-Tukey transform: • yij = F-T(clicks and impressions at ij)≈ transformed-CTR • Variance stabilizing transformation: Var(y) is independent of E[y] needed in further modeling
Rare rate modeling • Generative Model (Tree-structured Markov Model) variance Wij Wparent(ij) Unobserved “state” Sparent(ij) Sij βparent(ij) covariates βij variance Vij Vparent(ij) yparent(ij) yij
Rare rate modeling • Model fitting with a 2-pass Kalman filter: • Filtering: Leaf to root • Smoothing: Root to leaf • Linear in thenumber of regions
Experiments • 503M impressions • 7-level hierarchy of which the top 3 levels were used • Zero clicks in • 76% regions in level 2 • 95% regions in level 3 • Full dataset DFULL, and a 2/3 sample DSAMPLE
Experiments • Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE • Some of these regions R>0 get clicks in DFULL • A good model should predict higher CTRs for R>0 as against the other regions in R
Experiments • We compared 4 models • TS: our tree-structured model • LM (level-mean): each level smoothed independently • NS (no smoothing): CTR proportional to 1/Ñ • Random: Assuming |R>0| is given, randomly predict the membership of R>0 out of R
Experiments TS Random LM, NS
Experiments Few impressions Estimates depend more on siblings Enough impressions little “borrowing” from siblings
Related Work • Multi-resolution modeling • studied in time series modeling and spatial statistics [Openshaw+/79, Cressie/90, Chou+/94] • Imputation • studied in statistics [Darroch+/1972] • Application of such models to estimation of such rare events (rates of ~10-3) is novel
Conclusions • We presented a method to estimate • rates of extremely rare events • at multiple resolutions • under severe sparsity constraints • Our method has two parts • Imputation incorporates hierarchy, fixes sampling bias • Tree-structured generative model extremely fast parameter fitting
Rare rate modeling • Freeman-Tukey transform • Distinguishes between regions with zero clicks based on the number of impressions • Variance stabilizing transformation: Var(y) is independent of E[y] needed in further modeling # clicks in region r ~ ~ # impressions in region r
Rare rate modeling • Generative Model • Sij values can be quickly estimated using a Kalman filtering algorithm • Kalman filter requires knowledge of β, V, and W • EM wrapped around the Kalman filter filtering smoothing
Rare rate modeling • Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagate information from root to leaves • Complexity: linear in the number of regions, for both time and space filtering smoothing
Rare rate modeling • Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagates information from root to leaves • Kalman filter requires knowledge of β, V, and W • EM wrapped around the Kalman filter filtering smoothing
Imputing xij • Iterative Proportional Fitting [Darroch+/1972] • Initialize xij = nij + mij • Top-down: • Scale all xij in every block in Z(i+1) to sum to its parent in Z(i) • Scale all xij in Z(i+1) to sum to the row totals • Scale all xij in Z(i+1) to sum to the column totals • Repeat for every level Z(i) • Bottom-up: Similar Z(i) Z(i+1) block Page classes Ad classes