1 / 43

Netflix Prize and Heritage Health Prize

Netflix Prize and Heritage Health Prize. Philip Chan. Cash Prizes to Stimulate Research. Ansari X Prize for Private Spaceflight (2004) [$10M] 100 km above earth twice within 2 weeks DAPRA Grand Challenge (2005) [$2M] autonomous vehicle: 131 miles in 10 hours

elpida
Download Presentation

Netflix Prize and Heritage Health Prize

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Netflix Prize andHeritage Health Prize Philip Chan

  2. Cash Prizes to Stimulate Research • Ansari X Prize for Private Spaceflight (2004) [$10M] • 100 km above earth twice within 2 weeks • DAPRA Grand Challenge (2005) [$2M] • autonomous vehicle: 131 miles in 10 hours • Archon X Prize for Genomics (2006) [$10M] • map 100 human genomes in 10 days

  3. Cash Prizes to Stimulate Research • Netflix Prize (2006) [$1M] • Recommend movies with 10% improvement • Heritage Health Prize (2011) [$3M] • Days in hospital next year with 0.4 error

  4. Netflix Prize netflixprize.com

  5. Netflix Prize • Task • Given customer ratings on some movies • Predict customer ratings on other movies • If John rates • “Mission Impossible” a 5 • “Over the Hedge” a 3, and • “Back to the Future” a 4, • how would he rate “Harry Porter”, … ? • Performance • Error rate (accuracy)

  6. Cash Award • Grand Prize • $1M • 10% improvement • by 2011 (in 5 years) • Progress Prize • $50K per year • 1% improvement

  7. Intellectual Property • Netflix has a non-exclusive license to the algorithm • Authors tell the world what the algorithm is

  8. Participation • 51K contestants • 41K teams • 186 countries

  9. Leader Board • Started on Oct 2, 2006 • Improvement by the top algorithm • after a week: ~0.9% • after two weeks: ~4.5% • after a month: ~5% • after a year: ~8.4% • after two years: ~9.4% • July 26, 2009 (less than 3 years): 10%

  10. Winner • BellKor’s Pragmatic Chaos • 7 members • Merger of 3 teams • BellKor • AT&T Labs, USA & Yahoo! Research, Israel • PragmaticTheory • telecommunications, Canada • BigChaos • started a company, Austria • A combination of different algorithms

  11. Runner-up • The Ensemble • ~30 members • “last-minute” merger • teams had 30 days to beat the first team that crossed the 10% threshold • same accuracy • behind by 20 minutes!

  12. Heritage Health Prize heritagehealthprize.com

  13. Health Care • 71M individuals admitted to US hospitals each year • Unnecessary admissions cost $30B

  14. Heritage Provider Network • Has a network of doctors in California • Can we identify earlier those most at risk and ensure they get the treatment they need? • Can we reduce unnecessary hospitalizations?

  15. Heritage Health Prize • Launch • http://www.youtube.com/watch?v=GuZ8nkpygAs • Given patient data • Predict how many days a patient will spend in a hospital in the next year • The prediction helps develop strategies to reduce emergencies and hence hospitalizations

  16. Grand Prize • $3M • At most 0.4 in error (~0.5 day) • By Apr 4, 2013 [2 years] • $500K Consolation Prize • not below 0.4 error

  17. Milestone Prizes • top 2 performers at each milestone • Aug 31, 2011 • $30K, $20K • Feb 13, 2012 • $50K, $30K • http://www.youtube.com/watch?v=pkmkNnGyihY • Sep 4, 2012 • $60K, $40K

  18. Performance of Algorithms • Prediction Error Rate (RMSLE) • where • real = log ( actual # of days + 1 ) • prediction = log ( predicted # of days + 1 ) • Prediction error threshold = 0.4 (~0.5 day)

  19. Intellectual Property • Exclusive license to Sponsor • and participant’s own use • Algorithms not previously published • Use of data sets is for the competition only • written consent for other purposes

  20. Data Sets • Training and validation data sets • For participants to design algorithms • Feedback data set • For calculating standings on Leaderboard • Scoring data set • For determining winners for prizes • http://www.heritagehealthprize.com/c/hhp/Data

  21. Data (in CSV format) • Members Data (113K members) • Claims Data (2.7M claims) • Drug Count Data (818K prescriptions) • Lab Count Data (361K labs) • Outcome Data (76K in Y2, 71K in Y3) • Target (71K in Y4 for prediction) • Total ~264 MB (including other files)

  22. Members Data • MemberID • AgeAtFirstClaim • Sex

  23. Claims Data • MemberID • ProviderID • Vendor ID • PCP (Primary care physician) ID • Year • Specialty (of physician/vendor?) • PlaceSvc (place of service) • office, outpatient hospital, inpatient hospital, … • PayDelay (between service and payment)

  24. Claims Data [continued] • LengthOfStay (in hospital) • DSFS (days since first claim) • PrimaryConditionGroup (diagnostic categories) • CharlsonIndex (affect of diseases on illness) • ProcedureGroup (intervention categories) • SupLOS (supplement to LengthOfStay) • 1 if LenghtOfStay is NULL because of de-identificaiton

  25. Drug Count Data • MemberID • Year • DSFS (Days since first service) • DrugCount (unique prescription drugs)

  26. Lab Count Data • Member Id • Year • DSFS (Days since first service) • LabCount (unique lab or pathology tests)

  27. Outcome Data • MemberID • DaysInHospital_Y2 (claims in Y1) • ie, Predict Y2 based on Y1 • DaysInHospital_Y3 (claims in Y2) • ClaimedTruncated • 1 if members with “truncated” claims

  28. Using Other Data? • Yes • Freely available to anyone (public source) • URL needs to be published to the forum • Except for • demographic, socioeconomic or clinical information about the members

  29. Naive Algorithms • For predicting the number of Days in Hospital in the next year • Posted as “benchmarks” on the Leaderboard

  30. Always Predict 15 (max) • Everyone goes to the hospital for at least 15 days

  31. Always Predict 15 (max) • Everyone goes to the hospital for at least 15 days • RMSLE = 2.628062 • 550+% over threshold

  32. Always Predict Zero • no one goes to the hospital

  33. Always Predict Zero • no one goes to the hospital • RMSLE = 0.522226 • 31% over threshold

  34. Predict Random Values • between 0 and 15

  35. Predict Random Values • between 0 and 15 • RMSLE = 0.752297 • 88% over threshold

  36. Always Predict Average • Average ~= 0.209179

  37. Always Predict Average • Average ~= 0.209179 • RMSLE = 0.486459 • 22% over threshold

  38. Leader Board • Competition started on Apr 4, 2011 with partial data • All data were released on June 4, 2011 • Sep 9, 2011

  39. Leader Board • Competition started on Apr 4, 2011 with partial data • All data were released on June 4, 2011 • Sep 9, 2011 • RMSLE: 0.456384 • ~14.1% over threshold • Aug 29, 2012 • RMSLE: 0.450426 • ~12.6% over threshold

  40. Teams • Sep 9, 2011 • 914 teams • 6021 entries • Aug 29, 2012 • 1292 teams

  41. Considerations • Accurate Prediction • algorithms • Efficiency • time • space

  42. Teams • Form your own teams • www.heritagehealthprize.com • Join my team • CSE 4403 Independent Study • CSE 5801 Independent Research

  43. www.heritagehealthprize.com Thank you

More Related