240 likes | 349 Views
Predicting Click Through Rate for Job Listings. Manish Gupta Yahoo! HotJobs Jan 22, 2009. CTR and its applications. CTR = Ratio of clicks to get full description of entity to views of a reduced version Rank results Impacts publisher revenue in pay for perf models Bidding in ad exchanges
E N D
Predicting Click Through Rate for Job Listings Manish Gupta Yahoo! HotJobs Jan 22, 2009
CTR and its applications • CTR = Ratio of clicks to get full description of entity to views of a reduced version • Rank results • Impacts publisher revenue in pay for perf models • Bidding in ad exchanges • Trends can help detect click frauds
CTR for new job listings • Avg CTR = 2.29% • MLE would have high variance
Related work • Regelson and Fain • Estimate CTR using topic clusters (job categories) • Richardson et. al. • Describe features for predicting CTR for ads. • Our baseline: avg CTR for a test job (2.29%)
Refined Problem definition • Ideal: Predict CTR(job j, position p, user cluster u, context c) Data sparsity Huge feature vector • Predict CTR(job) Use CTR versus position curve • Predict CTR(job, position)
Data set • Used HotJobs data from Aug 11, 2008 to Aug 31, 2008 to predict CTR of jobs on Sep 1, 2008 • 40K jobs from 7k+ companies • 32K train set and 8K as test set • Jobs have location, company name, category, creation date, posting date, optional position wise click history, job source, title, snippet & job description.
Different models • Weka: Linear Regression and SMOReg • Treenet: Gradient Boosted Decision Trees • Feature selection: • Weka: wrapper with evaluator=linear regression and search=GreedyStepwise • Treenet: Variable importance metrics
Features • Features from Similar Jobs (60) • CTR of jobs with same title/company/state/city+state/category and their cardinalities posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks • Features from Related Jobs (288) • CTR_mn of related jobs with m= |A-B| and n=|B-A| and cardinalities (0 ≤m,n≤ 5) posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks
Features • Job Title Features (11) • #words, #capitalized words, isAllCaps, hasHighPunct, hasLongWords, hasNumbers, vocabulory features • Daily CTR Features for past 3 weeks (21) • Other Features (10) • Job Category, age, location specificity, job source, and job description page features • Other potential features • high-marketing-pitch words, brand value of company, spam feedback, seasonal variations
Experiments and results • Baseline: Predict avg CTR for a test job (2.29%) • Predicting avg - category-wise – CTR (A) • Linear Regression over 390 features (B) – uses only 142 regressors. • GBDT using Treenet over 390 features (C) – uses 300 regressors. (at 256_600_0.01_100)
Important features • Similar Jobs features • Same company, title, city+state using 1 week click history • Others features • Creation date, job description page size, date of update, posting date, job category • Related Jobs features • Related_11, related_12 jobs posted in past 1/3 weeks over 1/3 week click history
Wrapper based feature selection with linear regression and with Treenet’s variable importance (E) -11 features. Pruning the feature set
Linear regression with 369 features (F) – uses 187 regressors. • Treenet uses 282 regressors at 256_600_0.01_20 (G) In absence of click history …
None of the sets alone helps! Analysis of regressor distribution
More features • Dyadic models to predict user-personalized CTR with (job feature vector, user feature vector) dyads. • Auto model updates to correct model drift • We built a machine learning system to predict CTR for job listings and presented our results using various regression metrics. Conclusion and future work