240 likes | 257 Views
This study focuses on predicting click-through rates (CTR) for job listings using machine learning techniques. The research analyzes data from Yahoo! HotJobs to build models that can estimate CTRs based on various features of job listings. The study explores the impact of different models, feature selection methods, and experiments conducted to improve prediction accuracy. Results indicate the effectiveness of gradient boosted decision trees and linear regression models in predicting CTRs. The analysis of important features such as job title, category, and related job histories provides insights for better CTR prediction. The study also discusses the challenges of data sparsity and the importance of continuous model updates to address model drift. Overall, this research contributes to the optimization of job listing performance through improved CTR predictions.
E N D
Predicting Click Through Rate for Job Listings Manish Gupta Yahoo! HotJobs Jan 22, 2009
CTR and its applications • CTR = Ratio of clicks to get full description of entity to views of a reduced version • Rank results • Impacts publisher revenue in pay for perf models • Bidding in ad exchanges • Trends can help detect click frauds
CTR for new job listings • Avg CTR = 2.29% • MLE would have high variance
Related work • Regelson and Fain • Estimate CTR using topic clusters (job categories) • Richardson et. al. • Describe features for predicting CTR for ads. • Our baseline: avg CTR for a test job (2.29%)
Refined Problem definition • Ideal: Predict CTR(job j, position p, user cluster u, context c) Data sparsity Huge feature vector • Predict CTR(job) Use CTR versus position curve • Predict CTR(job, position)
Data set • Used HotJobs data from Aug 11, 2008 to Aug 31, 2008 to predict CTR of jobs on Sep 1, 2008 • 40K jobs from 7k+ companies • 32K train set and 8K as test set • Jobs have location, company name, category, creation date, posting date, optional position wise click history, job source, title, snippet & job description.
Different models • Weka: Linear Regression and SMOReg • Treenet: Gradient Boosted Decision Trees • Feature selection: • Weka: wrapper with evaluator=linear regression and search=GreedyStepwise • Treenet: Variable importance metrics
Features • Features from Similar Jobs (60) • CTR of jobs with same title/company/state/city+state/category and their cardinalities posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks • Features from Related Jobs (288) • CTR_mn of related jobs with m= |A-B| and n=|B-A| and cardinalities (0 ≤m,n≤ 5) posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks
Features • Job Title Features (11) • #words, #capitalized words, isAllCaps, hasHighPunct, hasLongWords, hasNumbers, vocabulory features • Daily CTR Features for past 3 weeks (21) • Other Features (10) • Job Category, age, location specificity, job source, and job description page features • Other potential features • high-marketing-pitch words, brand value of company, spam feedback, seasonal variations
Experiments and results • Baseline: Predict avg CTR for a test job (2.29%) • Predicting avg - category-wise – CTR (A) • Linear Regression over 390 features (B) – uses only 142 regressors. • GBDT using Treenet over 390 features (C) – uses 300 regressors. (at 256_600_0.01_100)
Important features • Similar Jobs features • Same company, title, city+state using 1 week click history • Others features • Creation date, job description page size, date of update, posting date, job category • Related Jobs features • Related_11, related_12 jobs posted in past 1/3 weeks over 1/3 week click history
Wrapper based feature selection with linear regression and with Treenet’s variable importance (E) -11 features. Pruning the feature set
Linear regression with 369 features (F) – uses 187 regressors. • Treenet uses 282 regressors at 256_600_0.01_20 (G) In absence of click history …
None of the sets alone helps! Analysis of regressor distribution
More features • Dyadic models to predict user-personalized CTR with (job feature vector, user feature vector) dyads. • Auto model updates to correct model drift • We built a machine learning system to predict CTR for job listings and presented our results using various regression metrics. Conclusion and future work