KDD-Cup 2004

Explore the results and strategies of the KDD-Cup 2004 competition focusing on tasks in particle physics and protein matching, analyzing metrics, winners, and the impact of optimizing to different measures.

KDD-Cup 2004

Presentation Transcript

  1. KDD-Cup 2004 Chairs: Rich Caruana & Thorsten Joachims Web Master: Lars Backstrom Cornell University

  2. KDD-Cup Tasks • Goal: Optimize learning for different performance metrics • Task1: Particle Physics • Accuracy • Cross-Entropy • ROC Area • SLAC Q-Score • Task2: Protein Matching • Squared Error • Average Precision • Top 1 • Rank of Last

  3. Competition Participation • Timeline • April 28: tasks and datasets available • July 14: submission of predictions • Participation • 500+ registrants/downloads • 102 teams submitted predictions • Physics: 65 submissions • Protein: 59 submissions • Both: 22 groups • Demographics • Registrations from 49 Countries (including .com) • Winners from China, Germany, India, New Zealand, USA • Winners half from companies, half from universities

  4. Task 1: Particle Physics • Data contributed by Charles Young et al, SLAC (Stanford Linear Accelerator) • Binary classification: distinguishing B from B-Bar particles • Balanced: 50-50 B/B-Bar • 78 features (most real-valued) describing track • Some missing values • Train: 50,000 cases • Test: 100,000 cases

  5. Task 1: Particle Physics Metrics • 4 performance metrics: • Accuracy: had to specify threshold • Cross-Entropy: probabilistic predictions • ROC Area: only ordering is important • SLAC Q-Score: domain-specific performance metric from SLAC • Participants submit separate predictions for each metric • About half of participants submitted different predictions for different tasks • Winner submitted four sets of predictions, one for each task • Calculate performance using PERF software we provided to participants

  6. Determining the Winners • For each performance metric • Calculate performance using same PERF software available to participants • Rank participants by performance • Honorable mention for participant ranked first • Overall winner is participant with best average rank across all metrics

  7. and the winners are…

  8. Task 1: Physics Winners Christophe Lambert (Golden Helix Inc.): 3rd place overall (out of 65) Lalit Wangikar et al. (Inductis Inc.): 2nd place overall, HM Acc David Vogels et al. (MEDai Inc./University of Central Florida): 1st place overall, HM ROC, HM Cross-Entropy, HM SLQ

  9. Bootstrap Analysis of Results • How much does selection of winner depend on specific test set (100k)? • Algorithm: • Repeat many times: • Take 100k bootstrap sample (with replacement) from test set • Evaluate performance on bootstrap sample and re-rank participants • What is probability of winning/placing?

  10. Physics Winners: Bootstrap Analysis • 1000 bootstrap samples

  11. Physics: Full Table of Results

  12. Task 2: Protein Matching • Data contributed by Ron Elber, Cornell University • Finding homologous proteins (structural similarity) • 74 real-valued features describing match between two proteins • Data comes in blocks • Unbalanced: typically < 10 homologs (+) per block of 1000 • Train: 153 Proteins (145,751 cases) • Test: 150 Proteins (139,658 cases)

  13. Task 2: Protein Matching Metrics • Four performance metrics: • Mean Squared Error: probabilistic predictions • Mean Average Precision: only ordering within each block is important • Mean Top 1: best predicted match is true homolog in each block • Mean Rank of Last: finding all homologs • Again participants submitted separate predictions for each metric • Again, about half of participants submitted multiple sets of predictions • 19/20 top participants submitted multiple sets of predictions • Optimizing to each metric separately helped more on Protein than on Physics

  14. Task 2: Protein Winners Katharina Morik et al. (University of Dortmund): HM Rank Last David Vogel et al. (Aimed / University of Central Florida): 3rd place overall, HM Top1 Yan Fu et al. (Inst. of Comp. Tech., Chinese Academy of Sci.): 2nd place overall, HM Squared Error, HM Average Precision Bernhard Pfahringer (University of Waikato): 1st place overall

  15. Protein Winners: Bootstrap Analysis • 10,000 bootstrap samples

  16. Protein: Full Table of Results

  17. Does Optimizing to Each Metric Help? • About half of participants submitted different predictions for each metric • Among winners: • Some evidence that top performers benefit from optimizing to each metric • Some metrics incompatible: e.g., optimizing to APR hurts RMS

  18. PHYSICS Submitted For: PROTEINSubmitted For: ACC: APR: CXE: RKL: ROC: RMS: SLQ: TOP1: ACC APR +9,-9 +14,-10 +6,-7 +14,-15 +8,-8 +5,-11 CXE RKL +4,-16 +6,-18 +0,-17 +1,-18 +3,-12 +7,-19 Tested On: Tested On: ROC RMS +1,-27 +4,-9 +8,-9 +1,-28 +8,-8 +2,-29 TOP1 SLQ +4,-12 +6,-6 +8,-7 +13,-9 +5,-9 +12,-16 Physics: +67,-125 Biology: +82,-204 Did Groups Effectively Optimize to Different Measures? • Score predictions for one measure using the other measures.

  19. Did Groups Effectively Optimize to Different Measures? • How often did a submission for another measure perform better? • Do not count screw-ups and invalid predictions • Count only those predictions, where the rank stays within a window of  (x-axis) • Count only the groups in the top 40 Physics Protein

  20. Did Good Groups Benefit more than Bad Groups? • How often did a submission for another measure perform better? • Do not count screw-ups and invalid predictions • Count only those predictions, where the rank stays within a window of  =10 • Count only the groups in the top k (x-axis) Physics Protein

  21. How Big is the Benefit? • How much does swapping predictions change rank? • Count only those predictions, where the rank stays within a window of  (x-axis) • Count only the groups in the top 40 Physics Protein

  22. How Much did Predictions Differ Between Groups? • Fit MDS to Euclidian Distance between Prediction Vectors • Top 30 Groups MDS PlotPhysics, RMSE MDS PlotProtein, APR

  23. The Easy, the Difficult, and the Impossible • How often do the competitors agree on a classification? • X-Axis: number of competitors • Y-Axis: percentage of test examples x competitors classified correctly Physics AccuracyTop 10 Physics AccuracyTop 30

  24. The Easy and the Impossible • How often does everybody agree? • X-Axis: number of competitors from the top • Y-Axis: percentage of test examples everybody classified correctly / incorrectly Physics AccuracyEverybody Incorrect Physics AccuracyEverbody Correct

  25. How to Win KDD-Cup 2005: Collaborate • Ensemble that averages predictions of best participants

  26. How to Win KDD-Cup 2005: Collaborate • Ensemble that averages predictions of best participants

  27. Lessons Learned • Use WWW site for organizing competition. • Data and all results still available online • Approx. 400 new registrations since end of competition (used in courses, papers, research) • Registration process that provides anonymity, but allows tracking • Selection of suitable tasks • Sample size large enough, so that evaluation statistically reliable • But small enough so that tractable for most methods • Two tasks: one traditional, one that required non-standard techniques • Well-defined evaluation criteria, if possible • Automation if possible • Provide evaluation software for download (PERF software) • Automatic format and plausibility checking of submissions • Crucial team members: • Web Master++: Lars Backstrom (Cornell) • Data Providers: Charles Young (SLAC), Ron Elber (Cornell) • PERF: Alex Niculescu (Cornell), Filip Radlinski (Cornell), Claire Cardie (Cornell), …participants who found bugs: Chinese Academy of Sciences, University of Dortmund • Who is interested in results? • Data providers get connected with Data Mining experts • Data Mining community • Regulate exploitation by Winners: the “Vogel Effect” • Affiliated with conference, program, organization ...?

  28. Closing • Data and all results available online:http://kodiak.cs.cornell.edu/kddcup • PERF software download: http://www.cs.cornell.edu/~caruana • Thanks to: • Web Master++: Lars Backstrom (Cornell) • Physics Data: Charles Young (SLAC) • Protein Data: Ron Elber (Cornell) • PERF: Alex Niculescu (Cornell), Filip Radlinski (Cornell), Claire Cardie (Cornell), … • Thanks to participants who found bugs in the PERF software: • Chinese Academy of Sciences • University of Dortmund • And of course, thanks to everyone who participated!

  29. The Contest Goes On Physics Protein

