20 likes | 210 Views
Problem Description “Using Machine Learning to Make Money at Horse Races”. Pos Draw Btn Horse Wgt Jockey Trainer Age SP Comments Raceid
E N D
Problem Description“Using Machine Learning to Make Money at Horse Races” PosDrawBtnHorseWgtJockeyTrainerAgeSPCommentsRaceid 1 4 Timocracy 10-0 S Drowne A B Haynes 5 4/9 f led after 1f, ridden 2f out, stayed on well and in command final furlong opened 4/5 touched 4/5 £800-£1100 £400-£550 £400-£650 (x3) £400-£750 (x4) £500-£1000 (x4) £300-£600 £200-£400 (x2) 372966 2 5 1¾ Bussell Along (IRE) 9-3 S Sanders Stef Higgins 4 8/1 held up early, headway on outside to chase leaders 4f out, effort and hung left from 2f out, went 2nd 1f out, no chance with winner opened 10/1 touched 10/1 372966 Task: Given a training set; learn a function which predicts the winner/selects a horse to bet $10 on from a given set of entries. Performance Measures: • Accuracy in picking the winner of a race (simple version) • Return of placing a $10 bet on a horse in the race (advanced version; solves the “real problem” trying to make money on the track) Links: http://www.racingpost.com http://www.drf.com/
Problem Description2 • This is an individual project • In general, the problem is a ranking problem; one approach is to learn a function that assigns a score to the horses in a race and pick the horse with the highest score. But it can also be viewed as a classification or prediction problem. • The datasets will be “very basic” only containing a few attributes, but you are allowed to create additional attributes by creating statistics from datasets/by extracting information from other sources (e.g. percentage of races won by a jockey) • Basically, the project tries to predict the future. Likely we will use races of a single race track, given you are true temporal sequence of race: DS1(races in Jan./Feb.), DS2 (races in March/April),…DS6(true testset---you are not allowed to peak into this one; only Chun-sheng has access to this dataset) which serve as training sets, validation sets, test sets, and sources of new feature generation in the project. • Student have freedom in what approaches to use—there are many of them; adhoc approaches are welcome; likely every student will use a different approach, and some will solve the problem. • The goal is to get something running; students who use a well-tuned simple approach will get a better grades than students whouse a very complicated, sophisticated approach which does not run at all. • Deliverables: You will demo your system, write a medium-sized report, and Chun-sheng will test your system with a test set of his own. • You are allowed to use any software/tool in the project; you just have to mention what you used in your report • In general, the submission deadline is We., March 23, 11p, but the idea is you spent at most 5 weeks on the project!