10 likes | 159 Views
Positive definite kernel K(.,.) RBF: Linear:. Procedure of developing models to predict the malignancy of ovarian tumors. ROC curve on test set. Expected ROC curve on validation. training. test. -- LSSVMrbf -- LSSVMlin -- LR -- RMI.
E N D
Positive definite kernel K(.,.) RBF: Linear: Procedure of developing models to predict the malignancy of ovarian tumors ROC curve on test set Expected ROC curve on validation training test -- LSSVMrbf -- LSSVMlin -- LR -- RMI Demographic, serum marker, color Doppler imaging and morphologic variables Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu1, T. Van Gestel1, J. A. K. Suykens1, S. Van Huffel1, I. Vergote2, D. Timmerman2 1Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium, 2Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium Email address: chuan.lu@esat.kuleuven.ac.be 1. Introduction Ovarian masses is a common problem in gynecology. A reliable test for preoperative discrimination between benign and malignant ovarian tumors is of considerable help for clinicians in choosing appropriate treatments for patients. In this work, we develop and evaluate several LS-SVM models within Bayesian evidence framework, to preoperatively predict malignancy of ovarian tumors. The analysis includes exploratory data analysis, optimal input variable selection, parameter estimation, performance evaluation via Receiver Operating Characteristic (ROC) curve analysis. Goal: • High sensitivityfor malignancylow false positive rate. • Providing probability of malignancy for individuals. Biplot of Ovarian Tumor Data Visualizing the correlation between the variables and the relations between the variables and clusters. o: benign case x: malignant case 2. Methods LS-SVM Classifier within Bayesian Evidence Framework Blackbox of Bayesian LS-SVM Classifier initial set of {j} for rbf kernels Mercer’s theorem kernel type (rbf/linear) Data Exploration model evidence Dtrain solved in dual space Bayesian LS-SVM Classifier x Univariate Analysis Multivariate Analysis y(x), p(y=1|x,D,H) *,*,*,*, , b Level 1: infer w,b Patient Data Unv. Hospitals Leuven 1994~1999 PCA, Factor analysis Preprocessing • Input variable selection • Given a certain type of kernel, Performs forward selection • Initial: 0 variables, • Add: variable which gives the greatest increase in the current model evidence at each iteration. • Stop: when the adding of any remaining variable can no longer increase the model evidence. Descriptive statistics Histograms Stepwise logistic regression 425 records, 25 features 32% malignant Level 2: Infer hyperparameter Posterior class probability ModelDevelopment • Conditional class probabilities computed using Gaussian distributions • Posterior class probability • The probability of tumor being malignant p(y=+1|x,D,H) will be used for final classification (by thresholding). Input Selection Model Building Model Evaluation Level 3: Compare models Bayesian LS-SVM Classifier (RBF, Linear) Bayesian LS-SVM (RBF, Linear) ROC analysis: AUC Model evidence 10 variables were selected using an RBF kernel. l_ca125, pap, sol, colsc3, bilat, meno, asc, shadows, colsc4, irreg Logistic Regression Cross validation (Hold out, K-fold CV) Forward Selection (Max Evidence) 3. Experimental Results 4. Conclusions 1) Results from Temporal validation 2) Results from randomized cross-validation (30 runs) Within the Bayesian evidence framework, the hyperparameter tuning, input variable selection and computation of posterior class probability can be done in a unified way, without the need of selecting additional validation set. A forward input selection procedure which tries to maximize the model evidence can be used to identify the subset of important variables for model building. LS-SVMs have the potential to give reliable preoperative prediction of malignancy of ovarian tumors. Future work LS-SVMs are blackbox models. Hybrid methodology, e.g. combine the Bayesian network with the learning of LS-SVM, might be promising A larger scale validation is needed. • Training set : data from the first treated 265 patients • Test set : data from the latest treated 160 patients • randomly separating training set (n=265) and test set (n=160) • Stratified, #malignant : #benign ~ 2:1 for each training and test set. • Repeat 30 times RMI: risk of malignancy index = scoremorph× scoremeno× CA125 Averaged Performance on 30 runs of validations Performance on Test set • References • C. Lu, T. Van Gestel, J. A. K. Suykens, S. Van Huffel, I. Vergote, D. Timmerman, Prediction of malignancy of ovarian tumors using Least Squares Support Vector Machines, Artificial Intelligence in Medicine, vol. 28, no. 3, Jul. 2003, pp. 281-306. • J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle. Least Squares Support Vector Machines. World Scientific, Singapore: 2002.