1 / 5

Breast Cancer Diagnosis via Linear Hyper-plane Classifier

Breast Cancer Diagnosis via Linear Hyper-plane Classifier. Presented by Joseph Maalouf December 14, 2001. Breast Cancer is second only to lung cancer as a tumor-related cause of death in women .

ewan
Download Presentation

Breast Cancer Diagnosis via Linear Hyper-plane Classifier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Breast Cancer Diagnosis via Linear Hyper-plane Classifier Presented by Joseph Maalouf December 14, 2001

  2. Breast Cancer is second only to lung cancer as a tumor-related cause of death in women. Although there exists reasonable agreement on the criteria for benign/malignant (Benign means that the lump or other problem was not cancer, and Malignant means that the tissue does contain cancer cells) diagnoses using fine needle aspirate (FNA) and mammogram data, the application of these criteria are often quite subjective and a time consuming task for the physician. The idea of the project is to come up with a discriminant function (a separating plane in this case) to determine if an unknown sample is benign or malignant. The project will use the Wisconsin Diagnosis Breast Cancer Database (WDBC) made publicly available by Dr.William H.Wolberg of the Department of Surgery of the University of Wisconsin Medical School. There are 569 samples (357 benign, 212 malignant) with 32 attributes (patient ID, diagnosis type, 30 real-valued input features). The most effective pair of the attributes in determining a correct diagnosis will be determined and used to plot all the testing set points on a two dimensional figure. Problem Description

  3. Formally, given two sets B and M in the 30-dimensional real space R30, we wish to construct a discriminant function f, from R30 into R, such that: f (x) > 0 => x M , f (x)  0 => x B, Two approaches will be used to find the optimal hyper-plane: Linear Optimization Quadratic Optimization Linear Optimization Problem Formulation: The discriminant function f can be given by : f(x) = w’x - , determining a plane w’x =  that separates, to the extent possible, malignant points from benign ones in R30. It remains to show how to determine w R30 and   R from the training data. If we let the sets of m points, M, be represented by a matrix M Rmxn and the set of k points, B, be represented by a matrix B Rkxn, Solution Methods

  4. Solution Methods (continued) then the problem becomes one of choosing w and  to: min w,  (1/m)*|| (-M*w + e* +e)+||1 + (1/k)*|| (B*w - e * +e)+||1 This can be implemented using matlaboptimization tool box lp. • Quadratic Optimization Problem Formulation: The function f is given by : f(x) = w’x + b. For xi being a support vector, For xiM di = 1, f(xi) = w’*xi + b = |w|  wo’* xi + bo = 1 For xiB di = 1, f(xi) = w’*xi + b = |w|  wo’*xi + bo = 1 The objective is to find w and b such that (wo) = (wo’*wo)/2 is minimized subject to the constraints: di * (wo’* xi + bo)  1, where 1  i N (number of support vectors) To implement the quadratic optimization solution, I’ll use the OSU support vector machine.

  5. Results The linear optimization method determined a correct diagnosis with a success rate of 97.2 %. . The following plot shows a figure of one of the most effective pair of the attributes in determining a correct diagnosis. • For the quadratic optimization • method, I tried different types of • kernels with different parameters. • It seems as far, the SVM classifier with polynomial kernel of order 4 gives a better result, with a success rate of 91.4 %. • Training a SVM classifier • involves a time consuming.

More Related