Efficient Feature Selection Algorithm for CT-Scan Brain Image Diagnosis

A NEW FEATURE SELECTION ALGORITHM FOR AN EFFICIENT COMPUTER-AIDED DIAGNOSIS SYSTEM OF CT-SCAN BRAIN IMAGES Mrs. K. DHANALAKSHMI Associate Prof/ CSE, PSNA CET, Dindigul.

Objectives To develop an efficient Computer-Aided Diagnosis (CAD) system for CT-Scan Brain Images by developing a new Feature Selection technique called Relevant Feature Selection and Discretization algorithm (ReFe_SeDi) using three different classification techniques Fuzzy Support Vector Machine Combined Artificial Neural Network

Introduction Medical Image Diagnosis manual Process the knowledge of the physician or radiologist inaccurate and slow Computer-Aided Diagnosis (CAD) system automatic system accurate , faster and Reliable

Techniques Image preprocessing and Segmentation The task of image preprocessing is to enhance the image Image segmentation divides the image into non-overlapping regions, and it will separate the objects from the background. The regions of interest (ROIs) will be allocated for feature extraction. Feature extraction and selection This step is to find a feature set of the medical image that can accurately distinguish lesion/non- lesion or benign/malignant. The feature space could be very large and complex, so extracting and selecting the most effective features is very important. Decision Support System Based on the selected features, the suspicious regions will be classified as normal /benign/ malignant by various classification methods.

Proposed Work

Proposed System Pre- Processing and Segmentation Image enhancement Median Filtering Histogram equalization Segmentation active contour model based on Mumford–Shah segmentation techniques ( Chan and Vese 2001)

Feature Extraction and Selection • Feature Analysis - Feature Extraction • Two different textural features are extracted • Haralick defined features • Gabor wavelet features • Histogram based features • Haralick defined features • A grey Level Co-occurrence Matrices M(d,θ) are calculated for the directions of 0o, 45o, 90o, and 135o and for the distances 1,2,3, 4& 5 • Twenty GLCM matrices per image are produced • For each matrix, 14 Haralick texture features are calculated

Feature Extraction and Selection Proposed new algorithm ReFe_SeDi (Relevant Feature Selection and Discretization) • supervised filter based Feature Selection algorithm • performs discretization process for the continuous values of the features and selects most relevant features • considers one feature at a time • sorts the continuous values of individual feature • attempts to find interval split points that has a strong majority of one particular class. Definition 1: Class is the important keyword of a medical image diagnosis given by radiologist (normal, benign and malignant) Definition 2: Split Points are the limits of an interval of values. Definition 3: Most Frequent Class is the majority class in a particular interval Definition 4: An instance Iibelongs to the interval Qk, if its value vi is between two successive split points tsand ts+1

ReFe_SeDi Algorithm Two input parameters: • min_int : the minimum number of occurrences of the most frequent class allowed in an interval • min_occ_use : the minimum occupancy of the most frequent class in an interval Condition1: Generate split point ts if the class label of the present instance Ii, i ≥1 is different from the class label of the previous instance Condition 2: The number of occurrences of the most frequent class in the interval Qk must be equal or greater than the min_intvalue Condition 3: The split point ts+1 of two successive intervals Qk= (ts, ts+1) & Qk+1= (ts+1, ts+2) is removed if the majority classes of the two intervals are same and the number of occurrences of majority class in the interval Qk / number of occurrences of majority class in the interval Qk +1≥ min_occ_use

ReFe_SeDi Algorithm Input: Feature vectors V, Image class labels C (Three classes c1, c2 and c3 represents normal, benign and malignant respectively), Parameters min_int, min_occ_use, dim_reduce Output: feature vector FV for selected features Algorithm for each feature v Є V do Sort v values For each transaction i , create an instance Ii, of the form vi , ciwhere ci Є C Use Condition 1 to create a vector t of split points ts end for for each ts Є T do Remove t saccording to the condition 2 Remove t s and merge two consecutive intervals according to the condition 3 end for Save remaining split points in a vector Tf end for Arrange the features in ascending order based on the number of split points Select the (dim_reduce * |F|)*3/4 features as relevant features that have least number of split points Write the selected features discretized in FV Return FV.

Illustration of ReFe_SeDi Algorithm with example

Results and Discussion –Preprocessing and segmentation

Results and Discussion results of split point calculation for Contrast feature with distance 1 and orientation 45o

Results and Discussion Top 50 features with their split points

PR values for different K values using K-NN classifier for ReFe_SeDi algorithm and RelieF Algorithm

Results and Discussion • The performance of the proposed algorithm is analyzed using K-NN search query method by considering the following three set of features • Top 32 features selected using ReFe_SeDi algorithm • Top 32 features selected by Relief • All extracted 280 features

Conclusion • Various important texture features of CT-scan brain images have been extracted by using gray level co occurrence matrices • A new filter based supervised algorithm for feature selection called ReFe_SeDi is proposed and well explained • The relevance of the selected features are compared with another set of features selected by traditional algorithm called Relief by using K –NN search query method • The experimental results have proven that features selected by the proposed method produces higher classification rate • The execution of the proposed method is faster than Relief

Decision Support System using Fuzzy Support Vector Machine • Preprocessing • Feature Extraction and Selection • Gray Level Co–occurrence Matrix (GLCM) for the directions 0o, 45o, 90o, and 135o with the distances 1, 2, 3, 4 and 5 for the following six features : Variance, Homogeneity, Correlation, energy, entropy and Inverse Difference Moment • Totally, 120 features are extracted • By applying ReFe_SeDi algorithm top 18 features are selected

Background of Support Vector Machine Rn H Mapping function φ: Rn H

Background of Support Vector Machine - Non-Linearly Separable Data Let z=  (x) denotes the mapping of RN to feature space Z. It needs to find a hyperplane with the maximum margin f(x) = w. z +b =0 such that for each point (zi , yi) where zi =(xi) For the data samples that are not linearly separable, • Introduce some L nonnegative variable ξi ≥ 0 • the above equation can be rewritten as The optimal hyperplane problem is to subject to

Background of Support Vector Machine - Non-Linearly Separable Data The optimal hyperplane is calculated by combining langrange multiplier and kernel function as : • commonly used kernel functions • Linear: K (xi, xj) = xiT. xj • Polynomial: K (xi, xj) = (xiT. xj +1) d • Gaussian radial-basis function (RBF) network: • Two layer neural perception: • K (xi, xj) = tanh (xiT. xj +b)

Fuzzy SVM The optimal hyperplane problem is then solve to subject to The optimal hyperplane is calculated by combining langrange multiplier and kernel function as :

Logistic Regression Model • describes the relationship between explanatory variable to a dichotomous dependent variable y • Membership mi assigned to each feature vector • The logistic regression model is defined as • we may compute the probability of y • we can define the fuzzy membership mias, IF (yi= = 1) THEN mi = P ( yi= 1) ELSE IF (yi = = -1) then mi = 1 – P ( yi = 1)

Results and Discussion Results of logistic regression Model

Results and Discussion – Fuzzy Membership

Results and Discussion – Fuzzy Membership Based on the experimental results, logistic regression model is described as: 1327.6 − 287.3 * CO13 – 342.8 * CO12+ 129.4 * HO24 + 193. 2 * HO53 + 181.7 * HO31 – 498.9 * CO23 + 34 .528 * IDM13 - 145.6 * CO51 − 0.0856 * VAR24 + 42. 351 * ENT34 + 20.183 * ENT51 + 0.00953 * VAR32 + 56.634 * EN24 - 0.0345 * VAR53 – 216. 3901 * IDM21 - 12. 0934 * IDM22 + 2.9251 * EN32 - 310. 3617 * IDM54

Results and Discussion Training Phase: Classification rate of FSVM for different support vectors generated using parameter C = 50 to 1000 and σ = 0.2 to 1.0

Results and Discussion – Training the FSVM the trade off parameter C varying from 50 to 1000 RBF kernel parameter σ varying from 0.1 to 1 FSVM model with C = 800 and σ =0.6 is selected No. of SV =28

Results and Discussion – Performance analysis

Results and Discussions - Performance analysis using roc curve

References CBTRUS Central Brain Tumor Registry of the United States Statistical Report Supplement 2013 Primary Brain and Central Nervous System Tumors Diagnosed in the United States 2006-2010, NEURO-ONCOLOGY, Journal of the Society for Neuro-Oncology, Volume 15 , Supplement 2 , 2013. Ashby LS, Troester MM, Shapiro WR, “Central nervous system tumors”, Update on Cancer Therapeutics , vol 1, pp: 475-513, 2006. Pan. H, Jianzhong. L, and Zhang. W, “Incorporating domain knowledge into medical image clustering”, Journal of Applied Mathematics and Computation , vol 185, pp:844–856, 2007. Hui L., Hanhu W., Mei C., Ten W., “Clustering ensemble technique Applied in the discovery and diagnosis of brain lesions”, In Proc: Sixth International Conference on Intelligent Systems Design and Applications (ISDA) , vol. 2: pp. 512–520, 2006. Joaquim C.F., Marcela X.R, Elaine P.M.S., AgmaJ.M.T., Caetano T.J., “ Effective shape-based retrieval and classification of mammograms”, In Proc: ACM symposium on Applied computing, pp. 250–255, 2006. K. Bommanna Raja · M. Madheswaran , K. Thyagarajah. Texture pattern analysis of kidney tissues for disorder identification and classification using dominant Gabor wavelet . Machine Vision and Applications, vol 21, pp :287–300, 2010. R. J. Ferrari, R. M. Rangayyan, J. E. L. Desautels, and A. F. Frère, “Analysis of Asymmetry in Mammograms via Directional Filtering With Gabor Wavelets”, IEEE Transactions on Medical Imaging, vol. 20, No. 9, 2001. Forman, G., “An extensive empirical study of feature selection metrics for text classification”, Journal of Machine Learning Research, vol 3, pp: 1289–1305, 2003.

References • Sotiris Kotsiantis, DimitrisKanellopoulos , Association Rules Mining: A Recent Overview ,GESTS International Transactions on Computer Science and Engineering, Vol.32 (1), 2006, pp. 71-82. • D. West and West, “Model selection for a medical diagnostic decision support system: a breast cancer detection case”, Artificial Intelligence in Medicine, vol 20(3), pp.183–204, 2000. • I.A.Basheer and M. Hajmeer, “Artificial neural networks: fundamentals, computing, design and application”, Journal of Microbiological Methods, vol 43(1), pp. 3–31, 2000. • B. Verma and J. Zakos, “A computer-aided diagnosis system for digital mammograms based on fuzzy-neural and feature extraction techniques”, IEEE Trans. Inf. Technol. Biomed., vol 5(1), pp.46–54, 2001. • Xiangjun Shi , H.D. Cheng , Liming Hua, WenJua, JiaweiTian , “Detection and classification of masses in breast ultrasound images “, Digital signal processing, vol 20 , Issue 3 Pages 824–836, 2010. • J. Jiang, P. Trundle, J. Ren, “Medical image analysis with artificial neural networks”, Computerized Medical Imaging and Graphics, vol 34, pp: 617–631,2010. • T. Chan and L. Vese. “Active contours without edges”, IEEE Transactions on Image Processing, volume10, pp. 266-277, 2001. • Ashizawa, K., MacMahon, H., Ishida, T., Nakamura, K., Vyborny, C. J., Katsuragawa, S., Doi, K., “Effect of an artificial neural network on radiologists' performance in the differential diagnosis of interstitial lung disease using chest radiographs”, Am. J.Roentgenol. 172:1311–1315, 1999.

References • Chen DR, Chang RF, Huang YL, “ Computer-aided diagnosis applied to US of solid breast nodules by using neural networks”, Radiology, vol 213, pp:407-412, 1999. • Ubeyli E. D & Guler , “ Neural network analysis of internal carotid arterial Doppler signals: predictions of stenosis and occlusion”, Expert Systems with Applications, 25(1), 1–13, 2003. • ElifDeryaUbeyli. Combined neural network model employing wavelet coefficients for EEG signals classification. Digital Signal Processing , Elsevier.2009; 19 :297–308. • C. Ordonez, C. A. Santana, and L. de Braal, “Discovering interesting association rules in medical data”, Proc of ACMSIGMOD Workshop on Research Issues on Data Mining and Knowledge, pp: 78–85, 2000. • Carlos Ordonez, “Association Rule Discovery With the Train and Test Approach for Heart Disease Prediction”, IEEE Transactions on Information Technology in Biomedicine, Vol. 10, No. 2, 2006. • GourabKundu , SirajumMunir, Faizul Bari1, Monirul Islam, and K. Murase,” A Novel Algorithm for Associative Classification”, Neural Information Processing , pp:453-459, 2008. • M. Madheswaran and P. Rajendran, An improved brain image classification technique with mining and shape prior segmentation procedure, J Med Syst 36 (2010), 747–764. • V. Hansen, R.D. Nelson. “Data mining of time series using stacked generalizers”, neuro computing, vol 10, pp: 271–289, 2002. • Haralick, R.M., Shanmugam, K., Dinstein, I, “Textural features for image classification”, IEEE Trans. Syst. Man Cybern, vol 3, pp: 610– 621, 1973.

Thank You

Efficient Feature Selection Algorithm for CT-Scan Brain Image Diagnosis

Efficient Feature Selection Algorithm for CT-Scan Brain Image Diagnosis

Presentation Transcript