1 / 8

Learning to Rank on a Cluster using Boosted Decision Trees

Learning to Rank on a Cluster using Boosted Decision Trees. Krysta M. Svore , Christopher J.C. Burges Microsoft Research. Motivation. Growing number of extremely large datasets Terabytes/day of search user interaction data Cheaper hardware

doyle
Download Presentation

Learning to Rank on a Cluster using Boosted Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Rank on a Cluster using Boosted Decision Trees Krysta M. Svore, Christopher J.C. Burges Microsoft Research

  2. Motivation • Growing number of extremely large datasets • Terabytes/day of search user interaction data • Cheaper hardware • Decreased model training time allows for solutions to more problems (faster throughput) • Distribute LambdaMART[Wu et al., 2009] • Gradient boosted decision trees, where gradients are approximated using λ-functions • Constructed incrementally # boosting iterations; sample feature vector; weak hypothesis • Key component of winning system in Yahoo! Learning to Rank Challenge

  3. Our Approach • Feature distribution • Data fits in main memory of a single machine • Tree split computations are distributed • Learned model equivalent to centralized model • Data distribution • Data does not fit in main memory of a single machine • Data is distributed and used for cross-validation • Learned model not equivalent to centralized model • Two strategies for selection of next weak hypothesis (full and sample)

  4. Time Comparison • Two types of cluster nodes. • Features split across nodes. • Each node stores all of the training data. • Centralized (dotted) • Feature-distributed (solid) Feature distribution yields 6 times speed-up

  5. Time Comparison • Data split across nodes. • Each node stores 3500 queries. (Centralized stores 3500*K queries). • Centralized (dotted) • Full data-distributed (solid) • Sample data-distributed (dashed) Data distribution yields 10 times speed-up

  6. Accuracy Comparison • Data split across nodes. • Each node stores 3500 queries. • Full data-distributed (solid) • Sample data-distributed (dashed) Data distribution yields significant accuracy gain when data size exceeds main memory size.

  7. Accuracy Comparison • Data split across nodes. • Each node stores 3500 queries. (Centralized stores 3500*K queries). • Centralized (dotted) • Full data-distributed (solid) • Sample data-distributed (dashed) Centralized model significantly better than data-distributed model. More training data better than massive cross-validation.

  8. Future Work • Performing massive, distributed cross-validation results in significant accuracy loss compared to learning from centralized training data. • Can we distribute data and achieve equal/better accuracy while decreasing training time? • Can we develop an asynchronous approach resulting in additional speed improvements? Thanks!

More Related