1 / 24

Predicting Click Through Rate for Job Listings

This study focuses on predicting click-through rates (CTR) for job listings using machine learning techniques. The research analyzes data from Yahoo! HotJobs to build models that can estimate CTRs based on various features of job listings. The study explores the impact of different models, feature selection methods, and experiments conducted to improve prediction accuracy. Results indicate the effectiveness of gradient boosted decision trees and linear regression models in predicting CTRs. The analysis of important features such as job title, category, and related job histories provides insights for better CTR prediction. The study also discusses the challenges of data sparsity and the importance of continuous model updates to address model drift. Overall, this research contributes to the optimization of job listing performance through improved CTR predictions.

goldbergj
Download Presentation

Predicting Click Through Rate for Job Listings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Click Through Rate for Job Listings Manish Gupta Yahoo! HotJobs Jan 22, 2009

  2. CTR and its applications • CTR = Ratio of clicks to get full description of entity to views of a reduced version • Rank results • Impacts publisher revenue in pay for perf models • Bidding in ad exchanges • Trends can help detect click frauds

  3. CTR for new job listings • Avg CTR = 2.29% • MLE would have high variance

  4. CTR for job listings

  5. Related work • Regelson and Fain • Estimate CTR using topic clusters (job categories) • Richardson et. al. • Describe features for predicting CTR for ads. • Our baseline: avg CTR for a test job (2.29%)

  6. Refined Problem definition • Ideal: Predict CTR(job j, position p, user cluster u, context c) Data sparsity Huge feature vector • Predict CTR(job) Use CTR versus position curve • Predict CTR(job, position)

  7. Data set • Used HotJobs data from Aug 11, 2008 to Aug 31, 2008 to predict CTR of jobs on Sep 1, 2008 • 40K jobs from 7k+ companies • 32K train set and 8K as test set • Jobs have location, company name, category, creation date, posting date, optional position wise click history, job source, title, snippet & job description.

  8. Different models • Weka: Linear Regression and SMOReg • Treenet: Gradient Boosted Decision Trees • Feature selection: • Weka: wrapper with evaluator=linear regression and search=GreedyStepwise • Treenet: Variable importance metrics

  9. Features • Features from Similar Jobs (60) • CTR of jobs with same title/company/state/city+state/category and their cardinalities posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks • Features from Related Jobs (288) • CTR_mn of related jobs with m= |A-B| and n=|B-A| and cardinalities (0 ≤m,n≤ 5) posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks

  10. Features • Job Title Features (11) • #words, #capitalized words, isAllCaps, hasHighPunct, hasLongWords, hasNumbers, vocabulory features • Daily CTR Features for past 3 weeks (21) • Other Features (10) • Job Category, age, location specificity, job source, and job description page features • Other potential features • high-marketing-pitch words, brand value of company, spam feedback, seasonal variations

  11. Experiments and results • Baseline: Predict avg CTR for a test job (2.29%) • Predicting avg - category-wise – CTR (A) • Linear Regression over 390 features (B) – uses only 142 regressors. • GBDT using Treenet over 390 features (C) – uses 300 regressors. (at 256_600_0.01_100)

  12. Analysis of regressor distribution

  13. Important features • Similar Jobs features • Same company, title, city+state using 1 week click history • Others features • Creation date, job description page size, date of update, posting date, job category • Related Jobs features • Related_11, related_12 jobs posted in past 1/3 weeks over 1/3 week click history

  14. Pruning the feature set

  15. Wrapper based feature selection with linear regression and with Treenet’s variable importance (E) -11 features. Pruning the feature set

  16. Linear regression with 369 features (F) – uses 187 regressors. • Treenet uses 282 regressors at 256_600_0.01_20 (G) In absence of click history …

  17. None of the sets alone helps! Analysis of regressor distribution

  18. Pruning the feature set

  19. Variable importance curves

  20. More features • Dyadic models to predict user-personalized CTR with (job feature vector, user feature vector) dyads. • Auto model updates to correct model drift • We built a machine learning system to predict CTR for job listings and presented our results using various regression metrics. Conclusion and future work

  21. Thanks for your time

More Related