90 likes | 173 Views
RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site . Kingsfield , Dragon. Beat Benchmark. Beat Benchmark. Naive Bayes We want to know the probability that user click sku under context . We use query as context first. So we have:
E N D
RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site Kingsfield, Dragon
Beat Benchmark • Naive Bayes • We want to know the probability that user click sku under context . • We use queryas context first. • So we have: • Select 5 item with highest predicted probability as prediction.
Use Time information • Time is a good feature in data mining.
Use Time information • Divided data into 12 time periods based on click_time field • Use frequency at time period where click_time belongs to as “prior” instead of global frequency.
Use Time information • Smooth data
Unigram to Bigram • Likelihood of Naive Bayes: • Here is word. • Use Bigram instead of Unigram(word). • Use query “xbox call of duty” • Rerank: “call duty of xbox” • Bigram: [“call duty”, ”call of”, ”call xbox”… “of xbox”] • Once We have bigram training data, the rest is the same as unigram • Blending unigram and bigram:
Data Processing • The most important part: Query Correction • Lemmatization • Split words and number • Query correction(in small version) • A lot of thing that can help to improve: • “x box”, “x men” • New algorithm for query correction • Rank predictions that user clicked lower.
Conclusion • Data Preprocessing and feature Engineering are most important things.