1 / 30

PIRRACCHIO R, Petersen M, Carone M, Resche Rigon M, Chevret S and van der Laan M

Exemple d’utilisation des données de la base MIMIC-II pour la construction d'un score prédictif de mortalité en réanimation Mortality Prediction in the ICU: Can we do better ? Super ICU Learner Algorithm (SICULA) Project.

kaelem
Download Presentation

PIRRACCHIO R, Petersen M, Carone M, Resche Rigon M, Chevret S and van der Laan M

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exemple d’utilisation des données de la base MIMIC-II pour la construction d'un score prédictif de mortalité en réanimationMortalityPrediction in the ICU: Can we do better ?Super ICU LearnerAlgorithm (SICULA) Project PIRRACCHIO R, Petersen M, Carone M, RescheRigon M, Chevret S and van der Laan M Division of Biostatistics, UC Berkeley, USA Département de Biostatistiques et informatique Médicale, UMR-717, Paris, France Service d’Anesthésie-Réanimation, HEGP, Paris

  2. Motivations for MortalityPrediction • Improved mortality prediction for ICU patients in remains an important challenge: • Clinical research: stratification/adjustment on patients’ severity • ICU care: adaptation of the level of care/monitoring; choice of the appropriate structure • Health policies: performance indicators

  3. Currently used Scores • SAPS, APACHE, MPM, LODS, SOFA,… • And several updates for each of them • The mostwidely in practice are: • The SAPS II score in Europe Le Gall, JAMA 1993 • The APACHE II score in the US Knauss, Crit Care Med 1985

  4. Currently used Scores • SAPS, APACHE, MPM, LODS, SOFA,… • And several updates for each of them • The mostwidely in practice are: • The SAPS II score in Europe Le Gall, JAMA 1993 • The APACHE II score in the US Knauss, Crit Care Med 1985 PROBLEM: fair discrimination but poor calibration

  5. Why are the current scores performingthatbad ? • 3potentialreasons for that: • Global decrease of ICU mortality • Covariateselection • ParametricLogisticregression => Whichmeansweacknowledgeassuming a linearrelationshipbetween the outcome and the covariates

  6. Why are the current scores performingthatbad ? • 3potentialreasons for that: • Global decrease of ICU mortality • Covariateselection • ParametricLogisticregression => Whichmeansweacknowledgeassuming a linearrelationshipbetween the outcome and the covariates WHY wouldweacceptthat ??? • We have alternatives ! • Data-adaptive machine techniques • Non-parametricmodellingalgorithms

  7. Super Learner • Method to choose the optimal regression algorithm among a set of (user-supplied) candidates, both parametric regression models and data-adaptive algorithms (SL Library) • Selection strategy relies on estimating a risk associated with each candidate algorithm based on: • loss-function (=risk associated with each prediction method) • V-fold cross-validation • Discrete Super Learner : select the best candidate algorithm defined as the one associated with the smallest cross-validated risk and reruns on full data for the final prediction model • Super Learner convex combination: weighted linear combination of the candidate learners where the weights are proportional to the risks. van der Laan, Stat Appl Genet Mol Biol 2007

  8. Discrete Super Learner (or Cross-validatedSelector) van der Laan, Targeted Learning, Springer 2011

  9. Discrete Super Learner • The discrete SL canonly do as well as the best algorithmincluded in the library • Not bad, but…. • Wecan do betterthanthat !

  10. Super Learner • Method to choose the optimal regression algorithm among a set of (user-supplied) candidates, both parametric regression models and data-adaptive algorithms (SL Library) • Selection strategy relies on estimating a risk associated with each candidate algorithm based on: • loss-function • V-fold cross-validation • Discrete Super Learner : select the best candidate algorithm defined as the one associated with the smallest cross-validated risk and reruns on full data for the final prediction model • Super Learner convex combination: weighted linear combination of the candidate learners where the weights weights themselves are fitted data-adapvely using Cross-validation to give the best overall fit van der Laan, Stat Appl Genet Mol Biol 2007

  11. Discrete Super Learner (or Cross-validatedSelector) van der Laan, Targeted Learning, Springer 2011

  12. On which data performedour analyses ? • Needs for : • Large database • Not toospecific • Reflection of currentmedical practice… then of currentmortality • Complete • With all items used on previous score • With few missing data • Welldescribed • Easilyreachable

  13. MIMIC-II !!! • Publically available dataset including all patients admitted to an ICU at the Beth Israel Deaconess Medical Center (BIDMC) in Boston, MA since 2001. • medical (MICU), trauma-surgical (TSICU), coronary (CCU), cardiac surgery recovery (CSRU) and medico-surgical (MSICU) critical care units. • Patient recruitment is still ongoing. • data collected before December 1st 2012 • adult ICU patients (>15 years-old) Lee, Conf Proc IEEE Eng Med Biol Soc 2011 Saeed, Crit Care Med 2011

  14. MIMIC-II : beginnings of an obstacle course • Access to the ClinicalDatabase: • On-line course on protecting human research participants (minimum 3 hours) • For all participant • Basic Access Web interface : • Requiresknowledge of SQL…. User friendly for databasesspecialist • Limited size of the data export • number of patients • number of variables • Slow • Adapted for smallstudies, rare deseases or rare events

  15. MIMIC-II : beginnings of an obstacle course • Entire MIMIC II ClinicalDatabase : • More than 40000 files (1 per patient) • Withineach files around 25 .txt files • Around 20 Giga Needs : • Endpoint: hospital mortality • Explanatory variables : those included in the SAPS II score (20 variables) • Dichotomized as in the SAPS II (Super Learner 1) • Non-transformed (Super Learner 2)

  16. Constraints • Statisticalsofware R • Package workswithdataframe • Slow with large dataset • Few knowledge in SQL • Time • Distance betweenstatisticians

  17. Choices • Decision to use R to read the datasests an constitute the working file • Allow us to welldefinecovariatesthatweneed • Quick to write • Independent of databasespecialists • But • Long (3 hours to obtain the .Rdata) • Not soeasy to modified • Need to wellunderstand the database and the way of coding data

  18. Results

  19. SAPS II

  20. SAPS II Super Learner 1

  21. Super Learner 1

  22. Super Learner 2

  23. Conclusion • As compared to conventional severity scores, our Super Learner-based proposal offers improved performance for predicting hospital mortality in ICU patients. • The score willevoluatetogetherwith • New observations • New explanatory variables • SICULA : Just play with it !! http://webapps.biostat.berkeley.edu:8080/sicula/

  24. Personalexperience • Increasingnumber of reviews for paperswithonly one angle of attack : the size of the database • Forgetting : • Quatility of the data • Difficulties in terms of model • Increasingnumber of request for analysis… but late in the process  Bettercollaboration

  25. Limits • Good example of what a large data base allows • In thisexample : • Covariates and endpointswere : • Relativelywelldefined • Systematicallycollected • Question was simple and unique • But thisisstill a monocentricobservationalstudy

  26. Size does not matter ! • Fromour point of view the key characteristic of such data isnot the size • Size always reflects issues of statistical power related to • the frequency of exposures and outcomes • the effect sizes • Key characteristics are : • Observational data • Not planed to respond to a specific question • Complex data

  27. Size does not matter ! • This implies • Questions shouldbeadapted to that type data or that collection of the data shouldbeadapt to questions • Choice of an adaptedanalysis model • a trade-off between analytic flexibility and the granularity of information • Recent methodological advances have given us new analytic tools to perform complex statistical analysis • Fortunatelywe have a lot data  • Unfortunatelywe have a lot data 

  28. Size does not matter ! • Stronghypotheses • Non informative censoring • No confounder • Missingatrandom • ….. • Strongcomputingconstraints

More Related