Modèles de prédiction Intérêt de la validation

Modèles de prédictionIntérêt de la validation M. Dramaix-Wilmet Département de Biostatistique

Modèle de prédiction • Equation prédiction d’une variable “dépendante” à partir de plusieurs autres • Modèles de régression linéaire • Modèles de régression logistique • Classification • Autres …..

Modèles de prédiction  problème!! “OPTIMISME” Le modèle “prédit” bien pour l’échantillon à partir duquel il est établi Qu’en est-il pour d’autres échantillons?

Modèles de prédictionValidation: un exemple External validation is necessary in prediction research: A clinical example S.E. Bleekera, H.A. Moll, E.W. Steyerberg, A.R.T. Donders, G. Derksen-Lubsen, D.E. Grobbee, K.G.M. Moons Journal of Clinical Epidemiology 56 (2003) 826–832

Modèles de prédictionValidation: un exemple The derivation set was comprised of 376 children with fever without apparent source, and the validation set consisted of 179 children who had been referred for the same reason (three respectively zero patients were excluded because of isolation of Haemophilus influenzae). Except for the variable pale skin, no material differences were found in the distribution of the general characteristics and the predictors between the two sets. A serious bacterial infection was present in 20% of the children in the derivation set and in 25% of the validation set.

Modèles de prédictionValidation: un exemple Strong predictors of serious bacterial infection included age above 1 year, duration of fever, changed crying pattern, nasal discharge or earache in history, ill clinical appearance, pale skin, chest-wall retractions, crepitations, and signs of pharyngitis or tonsillitis. The ROC area of this model was 0.825 (95%CI: 0.78–0.87) and the R² :32.3% (95%CI: 15.1–49.4%). Subsequently, the model was applied to the validation set to test its predictive performance. The ROC area dropped to 0.57 (95%CI: 0.47–0.67) and the R² to 2.0%.

Modèle de prédictionCrières de validation • Exemple : régression logistique • Performance • Discrimination : capacité à différencier les “malades” des “non malades” surface ROC • Calibration • Concordance entre probabiltés prédites et observées ex: test de Hosmer et Lemeshow • Biais: différence entre moyenne des probabiltés prédites et moyenne des probabilités observée • Mesures globales • Ex: R² de Nagelkerke

Modèles de prédictionMéthodes de Validation • Validation interne • Data splitting • Cross-validation • Bootstrap • Validation externe • Un autre échantillon

Modèles de prédictionMéthodes de Validation • « Shrinkage Factor » = facteur de réduction • Facteur déterminé à partir des méthodes de validation interne ci-dessus • Corrige « l’optimisme » du modèle établit à partir des données • Ex. : réduit coefficient de régression réduit le OR, réduit la surface ROC…

Modèles de prédictionShrinkage factor

Modèles de prédictionShrinkage factor • Données: cancer du sein • n=138 • Événement: récurrences (76) • FUP median: 83 mois • Modèle de Cox • 1 prédicteur=S-phase fraction • HR=exp(b*X) • Question: meilleur cut-point?

Modèles de prédictionShrinkage factor • Approche “ad-hoc”: Cutpoint p-value minimum • Sans correction • Cutpoint=10.7 • pmin=0.007 • b=0.863 • HR=2.37 • P-value “corrigée” pour tenir compte de “l’optimisme” • Pcor=0.12 facteur de réduction=0.57 • Bcor=0.57*0.863 • HRcor=1.64

Modèles de prédictionShrinkage factor

Modèles de prédictionValidation – Data splitting • Simple • On divise l’échantillon aléatoirement en deux morceaux “Training” “Validation” sample sample

Modèles de prédictionValidation – Data splitting • Repeated data splitting • On divisealéatoirement l’échantillon en deux morceaux « training sample » et « validation sample » • On répète un grand nombre de fois la procédure • On analyse la distribution des statistiques étudiées

Modèles de prédictionValidation – Data splitting STATISTICA, anno LXIII, n. 2, 2003 MODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION IN LOGISTIC REGRESSION R. Arboretti Giancristofaro, L. Salmaso

Modèles de prédictionValidation – Data splitting i) Data-splitting The original sample is randomly split into the fitting and validation samples. ii) Model fitting The model is fitted on the fitting sample using the SAS logistic procedure. iii) Event probability estimation We use the fitted model to estimate the probability of a positive outcome for each of the subjects in both the fitting and the validation samples.

Modèles de prédictionValidation – Data splitting iv) Computation of performance measures on both samples For both the fitting and the validation samples we compute the following statistics: - C statistic (measure of discrimination); - Hosmer and Lemeshow chi-squared test (measure of calibration); - bias (measure of calibration). v) Iterations and full model In order for a model to be validated, the above described procedure is repeated 100 times. After that we also fit the model on the full original sample.

Modèles de prédictionValidation – Data splitting vi) Results Each time the procedure is repeated the sample is split into two random portions, the model is fitted on one of the two portions, and its performance is evaluated on both portions. Since each iteration is based on a different split of the original data, it results in different model coefficients, significance levels, and performance values. vii) Presentation of the results Results are presented using both tables and box-plots.

Modèles de prédictionValidation – Data splitting • viii) Interpretation of the results • variability of the estimates of the model’s parameters over the 100 repetitions. • variability large model’s coefficients highly depend on the particular portion of the original data used to fit the model clear symptom of instability of the model and, what is worse, of overfitting (not enough data to compute a reliable estimation of the model’s parameters)

Modèles de prédictionValidation – Data splitting • quality of fit • distributions of the estimates to be averaged around the same values as the estimates computed on the whole original sample. If this does not happen, the model cannot be validated because of its internal instability. • performance outside the fitting sample. • comparing the fitting to the validation distributions of the measures of discrimination and calibration. We expect the model to perform better on the fitting sample, i.e. we expect lower levels for both discrimination and calibration when shifting from the fitting to the validation distribution a reduction in the magnitude of the performance measures is to be expected. • Drop in value of the measures is too large model does not validate outside the fitting sample.

Modèles de prédictionValidation – Data splitting

Modèles de prédictionValidation – Cross-validation • K-fold cross-validation • On divise l’échantillon en K sous-échantillons mutuellement exclusifs • On établit le modèle sur K-1 sous-échantillons pris ensembles • On applique le modèle sur le sous-échantillon omis • On répète l’opération en omettant tour à tour chacun des K sous échantillon • On étudie la distribution des statistiques relevées dans les sous-échantillons de validation (moyenne) ou on détermine un «skrinkage factor» (facteur de réduction) pour estimer «l’optimisme» du modèle dans l’échantillon original

Modèles de prédictionValidation – Cross-validation • Cas particulier: leave-in-one =jacknife • Les sous-échantillons sont constitués d’un seul sujet on ôte chaque sujet tour à tour; on établit le modèle sur les n-1 autres sujets et on le valide sur le sujet restant

Modèles de prédictionValidation croisée

Modèles de prédictionvalidation croisée

Modèles de prédictionValidation croisée Titre du document / Document title Construction et validation d'un modèle de prédiction de la date de floraison du colza d'hiver = Modelisation of the winter rape flowering date Auteur(s) / Author(s) HUSSON F. ; LETERME P. (2) ; Résumé / Abstract Le but de cette étude est de présenter une modélisation de la date de floraison du colza d'hiver. Ce modèle est ensuite paramétré pour différentes variétés de colza: Darmor, Bienvenu, Eurol, Bristol, Aligator, Goéland, Vivol et Symbol. On valide ensuite le modèle en faisant appel à des techniques de validation croisée. Ce type de validation est nécessaire si on doit à la fois paramétrer et valider le modèle avec peu d'observations. Revue / Journal Title OCL. Oléagineux, corps gras, lipides (OCL, Ol. corps gras lipides) ISSN 1258-8210 Oléagineux, corps gras, lipidesSource / Source 1997, vol. 4, no5, pp. 379-384 (23 ref.)

Modèles de prédictionValidation - Bootstrap • Sélection d’un échantillon aléatoire avec remplacement de taille n ou inférieure à n • On établit le modèle dans cet échantillon «bootstrap» caractéristiques • On répète l’opération un grand nombre de fois (1000, 2000 x) • On étudie la distribution des caractéristiques relevées dans les échantillons bootstrap et/ou en un déduit un « shrinkage factor » (facteur de réduction).

Modèles de prédictionValidation - Bootstrap A prediction rule for selective screening of infection Chlamydia trachomatis H M Götz, J E A M van Bergen, I K Veldhuijzen, J Broer, C J P A Hoebe and J H Richardus Sex. Transm. Inf. 2005;81;24-30

Modèles de prédictionValidation - Bootstrap Statistical analysis Univariate logistic regression analyses were performed, with self reported characteristics as independent variables and diagnosis of C trachomatis as the dependent variable. For the odds ratios, 95% confidence intervals (CI) were calculated. Variables showing an association of p,0.2 were included in the multivariable analysis. Backward stepwise selection was performed with a p value for the likelihood ratio test .0.10 as the criterion for elimination of variables from the model. Interactions between predictors and sex were assessed to study whether effects of predictors were different for men and women.

Modèles de prédictionValidation - Bootstrap The goodness of fit (reliability) of the model was tested by the Hosmer-Lemeshow statistic. The model’s ability to discriminate between participants with or without a chlamydial infection was quantified by using the area under the receiver operating characteristic curve (AUC). AUC values 0.7–0.8 are considered acceptable, 0.8–0.9 excellent, and .0.9 outstanding.17 Calibration was assessed graphically by plotting observed frequencies of chlamydial infection against predicted probabilities.

Modèles de prédictionValidation - Bootstrap The performance of screening criteria in a study population, from which the model is developed, is known often to be too optimistic. The internal validity of the regression model was therefore assessed to estimate the performance of the model in new participants, similar to the population used to develop the model. We used bootstrapping techniques: random samples, with replacement, were taken one hundred times from the study population. At each step predictive models were developed, including variable selection.

Modèles de prédictionValidation - Bootstrap Bootstrapping may help to reduce the bias in the estimated regression coefficients, and give an impression of the discriminative ability in similar participants of screening. The outcome is a correction factor for the AUC, and a shrinkage factor to correct for statistical over-optimism in the regression coefficients and to improve calibration of the model in future participants.

Modèles de prédictionValidation externe External validity was assessed by leaving out the four MHS in the sample one by one, and fitting regression models, including variableselection, on the remaining data. The discriminative ability of this model was assessed externally on the MHS data not included in the fitting procedure. This procedure replicates the situation in which the prediction model is applied in another MHS region with a population that may to some extent be different.

Modèles de prédictionValidation - Bootstrap Performance of predictive model and development of prediction score Multivariable logistic regression analysis showed that chlamydial infection was associated with high urbanisation, young age, ethnicity (Surinamese/Antillian), low/intermediate education, multiple lifetime partners, a new contact in the previous two months, no condom use at last sexual contact, and complaints of (post)coital bleeding in women and frequent urination in men (table 1). The only statistically significant interaction term in the model was sex and the number of lifetime partners.

Modèles de prédictionValidation – Bootstrap; externe The Hosmer-Lemeshow goodness of fit test had a p value of 0.12, indicating adequate goodness of fit. The model discriminated well between participants who were and were not infected by C trachomatis, with an AUC of 0.81 (95% CI 0.77 to 0.84). Internal validation showed optimism in the AUC of 0.03, resulting in a correction of the AUC from 0.81 to 0.78. In the external validation similar sets of predictors were selected. When tested in each separate MHS, the AUC varied from 0.74 to 0.80.

Modèles de prédictionValidation – Bootstrap

Modèles de prédictionvalidation externe - exemple • Exemple Prognostic Indices for Mortality of Hospitalized Children in Central Africa Michele Dramaix, Daniel Brasseur, Philippe Donnen, Paluku Bawhere, Denis Porignon, Rene Tonglet and Philippe Hennart American Journal of Epidemiology 1996; 143 (12)

Modèles de prédictionvalidation externe - exemple

Modèles de prédiction Intérêt de la validation