250 likes | 417 Views
INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY 2008. ROBUST CLINICAL PREDICTION. Topics. Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY Case study: INVASIVE BLADDER CANCER
E N D
INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY 2008 ROBUST CLINICAL PREDICTION
Topics • Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY • Case study: INVASIVE BLADDER CANCER • Application and results of several statistical methods to the case study • Robust clinical prediction using the NonParametric Combination of Dependent Permutation Tests (NPC Test) • Conclusions and practical suggestions
Necessary steps for ‘optimal’ statistical predictions Individual predictions based, e. g., on nomograms or other techniques Robust Statistical Analysis by suitable statistical methods (e.g. Nonparametric permutation methods) • Study design • Collecting data using a Web-based Database Study protocol …………………… ………………………. ……………………. ………………………. ……………………. ……………………….
Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY • The availability of an electronic database can improve the quality and completeness of collected data, reducing, in particular, the number of missing data and the risk of imputation errors. • Accuracy in defining the nature (observational/ randomized/…) and the endpoints of the study can lead to a better choice of the sample size and of the subsequent statistical analysis to perform.
ELECTRONIC DATABASE : An example WEB-based Database WEB-based Database Variables’ coding
Univariate Test (Student t test, Wilcoxon) Survival Analysis Multivariate Methods (Logistic regression, …) Classification complex methods (Neural Networks, Artificial Intelligence, …) STATISTICAL ANALYSIS: standard methods and recent advances NonParametric Combination of Dependent Permutation Tests (NPC Test)
Aim of the study:Detecting variables (factors) that best predict the outcome (DEAD or ALIVE) after a BLADDER CANCER DIAGNOSIS Case study:INVASIVE BLADDER CANCER Italian multicentric observational study (from Jan 2001 to Dec 2006) Reference: prof. PF. Bassi (Univ. Cattolica, Rome) Total sample size: 1,003 subjects Lost patients and DOC (Dead for Other Causes) patients were excluded 469 subjects including DOD (Dead of Disease) and AWD (Alive with Disease, i.e. “statistically” died) patients 534 subjects including NED (Non Evidence of Disease) patients
patient state of health at the first medical visit I Phase First sympton Diagnosis • patient condition after bladder cancer diagnosis II Phase Diagnosis • patient state after surgery (histopathological variables were examined) III Phase Diagnosis Surgery Case study:INVASIVE BLADDER CANCER • TNM-Classification of Bladder Cancer has been used, according to Wittekind & Sobin (2002), thus the original variables were transformed into ordinal variables. 30 endpoints were considered as relevant for the statistical analysis. • In particular, the interest is in evaluating the importance of endpoints, collected at three phases of the study, in predicting the outcome.
Results of Kaplan-Meier (survival analysis) (artificial example)
Results of Logistic Regression • The logisticregression model has been applied to the same dataset but very poor results were obtained (only two significant predictors: Stage TNM at I and II Phase) • The main problems for application: • the inability of logistic regression to handle missing values (missing data are present in 522 subjects out of 1,003 individuals); • the high number of coefficients to be estimated so that the recursive algorithm do not converge (after 1000 iterations). Note that when convergence is not achieved for parameter estimates, results may be unreliable.
Results of Logistic Regression: Number and % of missing values by variable Mean (missing values): 85,9 % mean (missing values): 9% Subjects with at least one missing values: 522 (52%)
Robust statistical prediction using NPC Test PERMUTATION APPROACH FOR HYPOTHESIS TESTING Themultivariate permutation approach for hypothesis testing by NonParametric Combination (NPC) offers the following advantages: • NPC Test implements methods and algorithms presented in several international papers by prof. L. Salmaso and prof. F. Pesarin. L. Salmaso leads an internationally recognised research group in theoretical and applied nonparametric statistics. • NPC TEST is a unique and innovative statistical method (and software) that provides researchers with authentic and powerful innovative solutions in the field of hypotheses testing.
Robust statistical prediction using NPC Test FEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0 • NPC TEST allows us to perform hypothesis testing in the case of: • NPC TEST also provides: • Data (including mixed variables):
Robust statistical prediction using NPC Test FEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0 An innovation of NPC TEST w.r.t. existing methods consists in the performance of any combination of tests, starting with an appropriate set of elementary tests, leading to a multivariate or multistrata overall global test through the NPC methodology. Elementary partial test statistics include: Combining functions for intermediate tests include: NPC TEST supports all statistical software standard functions: data import, data manipulating and produces an effective report that can be easily integrated and customized by means of an efficient text editor.
Robust statistical prediction using NPC Test • After processing variables thus obtaining p-values using NPC methods, we also performed a control of the familywise error rate (FWE) • The need for multiplicity control arises when any problem is structured into two or more experimental hypotheses (Finos and Salmaso, 2006) • In order to have an inference on all the hypotheses defining the multivariate problem, it is necessary to control the probability of erroneously rejecting at least one univariate (elementary) hypothesis; this is called multivariate type I error or familywise error rate (FWE) (Marcus et al., 1976)
Robust statistical prediction using NPC Test CLOSED TESTING GRAPHICAL REPRESENTATION
Conclusions and practical suggestions • NPC method can offer a significant contribution to successful research in biomedical studies with several endpoints • The advantages of NPC Test are connected with its flexibility of handling any type of variables • We recommended the use of this methodology whenever the normality assumption is hard to justify, in presence of missing values and when the number of variables is higherthan the number of subjects
REFERENCES • Bassi P.F., Pagano F. (2007). Invasive Bladder Cancer. Springer. • Corain L., Salmaso L. (2007). A critical review and a comparative study on conditional permutation tests for two-way ANOVA. Communications in Statistics – Simulations and Computation, 36, 791-805. • Finos L., Salmaso L. (2006). Weighted methods controlling the multiplicity when the number of variables is much higher than the number of observations. Journal of Nonparametric Statistics, 18, 245-261. • Finos L., Salmaso L. (2006). FDR- and FWE-controlling methods using data-driven weights. Journal of Statistical Inference and Planning, 137, 3859-3870. • Finos L., Salmaso L., Solari A. (2007). Conditional Inference under simultaneous stochastic ordering constraints. Journal of Statistical Inference and Planning, 137, 2633-2641. • Marcus R., Peritz E., Gabriel K.R. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63, 655-660. • Marozzi M., Salmaso L. (2006). Multivariate Bi-Aspect Testing for Two-Sample Location Problem. Communications in Statistics – Theory and Methods, 35, 477-488. • Salmaso L., Solari A. (2005). Multiple aspect testing for case-control designs. Metrika, 62, 331-340. • Wittekind C., Sobin L. H. (2002). TNMClassification of malignant tumours UICC, International Union Against cancer (6. ed.). Wiley-Liss, New York. • http://www.gest.unipd.it/~salmaso/NPC_TEST.htm
Results of Neural Networks • We applied a neural network model (Multilayer Perceptron) to the same dataset • By applying a k-fold cross-validation, we obtained a rate of right classification of 75.3% for DOD+AWD and of 60.5% for NED. By using the subset of variables identified by univariate analysis we got a very similar performance (74.5% and 62.4%) • Main problems of neural networks are: • Neural network work as black boxes, hence it is not possible to convert the neuronal structure into a known model structure • All input fields ‘must’ be numeric (in the study we do not have numerical but ordinal categorical variables) • Neuronal networks can suffer from a problem called interference (i.e. to forget some of what it learned on older data)