1 / 9

RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site

RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site . Kingsfield , Dragon. Beat Benchmark. Beat Benchmark. Naive Bayes We want to know the probability that user click sku under context . We use query as context first. So we have:

gerodi
Download Presentation

RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site Kingsfield, Dragon

  2. Beat Benchmark

  3. Beat Benchmark • Naive Bayes • We want to know the probability that user click sku under context . • We use queryas context first. • So we have: • Select 5 item with highest predicted probability as prediction.

  4. Use Time information • Time is a good feature in data mining.

  5. Use Time information • Divided data into 12 time periods based on click_time field • Use frequency at time period where click_time belongs to as “prior” instead of global frequency.

  6. Use Time information • Smooth data

  7. Unigram to Bigram • Likelihood of Naive Bayes: • Here is word. • Use Bigram instead of Unigram(word). • Use query “xbox call of duty” • Rerank: “call duty of xbox” • Bigram: [“call duty”, ”call of”, ”call xbox”… “of xbox”] • Once We have bigram training data, the rest is the same as unigram • Blending unigram and bigram:

  8. Data Processing • The most important part: Query Correction • Lemmatization • Split words and number • Query correction(in small version) • A lot of thing that can help to improve: • “x box”, “x men” • New algorithm for query correction • Rank predictions that user clicked lower.

  9. Conclusion • Data Preprocessing and feature Engineering are most important things.

More Related