1 / 19

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg Data Mining Institute University of Wisconsin - Madison Second Annual Review June 1, 2001

Download Presentation

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survival-Time Classification of Breast Cancer PatientsDIMACS Workshop on Data Mining and Scalable AlgorithmsAugust 22-24, 2001- Rutgers University Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg Data Mining Institute University of Wisconsin - Madison Second Annual Review June 1, 2001

  2. American Cancer SocietyYear 2001 Breast Cancer Estimates • Breast cancer, the most common cancer among women, is the second leading cause of cancer deaths in women (after lung cancer) • 192,200 new cases of breast cancer in women will be diagnosed in the United States • 40,600 deaths will occur from breast cancer (40,200 among women, 400 among men) in the United States • According to the World Health Organization, more than 1.2 million people will be diagnosed with breast cancer this year worldwide

  3. Main Difficulty: Cannot carry out comparative tests on human subjects • Our Approach: Classify patients into: Good,Intermediate& Poor groups • Classification based on: 5 cytological features plus tumor size • Classification criteria: Tumor size & lymph node status Key Objective • Identify breast cancer patients for whom adjuvant chemotherapy prolongs survival time • Similar patients must be treated similarly

  4. Principal ResultsFor 253 Breast Cancer Patients • All 69 patients in the Good group: • Had the best survival rate • Had no chemotherapy • All 73 patients in the Poor group: • Had the worst survival rate • Hadchemotherapy • For the 121 patients in the Intermediate group: • The 67 patients who had chemotherapy had better survival rate than: • The 44 patients who did not have chemotherapy • Last result reverses chemotherapy role for overall population • Very useful for treatment prescription

  5. Outline • Tools used • Support vector machines (SVMs). • Feature selection • Classification • Clustering • k-Median (k-Mean fails!) • Cluster chemo patients into chemo-good & chemo-poor • Cluster no-chemo patients into no-chemo-good & no-chemo-poor • Three final classes • Good = No-chemo good • Poor = Chemo poor • Intermediate = Remaining patients • Generate survival curves for three classes • Use SVM to classify new patients into one of above three classes

  6. Feature selection: SVM with 1-norm approach, min s. t. , where , denotes Lymph node > 0 or Lymph node =0 • 5 out 30 cytological features describenuclear size, shape and texture Support Vector Machines Used in this Work • 6 out of 31 features selected by SVM: • Tumor size from surgery • Classification:Use SSVMs with Gaussian kernel

  7. Clustering in Data Mining General Objective • Given:A dataset ofm points in n-dimensional real space • Problem:Extract hidden distinct properties by clustering the dataset

  8. of mpoints in • Given:Set represented by the matrix ,and a number of desired clusters , in such • Problem:Determine centers that the sum of the minima over of the 1-norm distance between each point , , , and cluster centers is minimized • Objective Function:Sum ofm minima of linear functions, hence it ispiecewise-linear concave • Difficulty:Minimizing a general piecewise-linear concave function over a polyhedral set is NP-hard Concave Minimization Formulationof Clustering Problem

  9. Minimize thesum of 1-norm distances between each data and the closest cluster center point : min min s.t. • Bilinear reformulation: min s.t. Clustering via Concave Minimization

  10. Step 1 (Cluster Assignment): Assign points to the cluster with the nearest cluster center in 1-norm Step 2 (Center Update) Recompute location of center for each cluster as the cluster median (closest point to all cluster points in 1-norm) Step3 (Stopping Criterion) Stop if the cluster centers are unchanged,else go toStep 1 Finite K-Median Clustering Algorithm(Minimizing Piecewise-linear Concave Function) Step 0 (Initialization): Givenkinitial cluster centers • Different initial centers will lead to different clusters

  11. ) • 6 out of 31 features selected by a linear SVM ( • SVM separating lymph node positive (Lymph > 0) from lymph node negative (Lymph = 0) • Poor1: Patients with Lymph > 4 OR Tumor Clustering Process: Feature Selection & Initial Cluster Centers • Perform k-Median algorithm in 6-dimensional feature space • Initial cluster centers used: Medians of Good1 & Poor1 • Good1: Patients with Lymph = 0AND Tumor < 2 • Typical indicator for chemotherapy

  12. Poor1: Lymph>=5 OR Tumor>=4 Compute Median Using 6 Features Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Compute Initial Cluster Centers Cluster 113 NoChemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Cluster 140 Chemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor 69 NoChemo Good Poor Intermediate Good Clustering Process 253 Patients (113 NoChemo, 140 Chemo)

  13. Survival Curves forGood, Intermediate& Poor Groups

  14. Survival Curves for Intermediate Group:Split by Chemo & NoChemo

  15. Survival Curves for All PatientsSplit by Chemo & NoChemo

  16. Survival Curves for Intermediate GroupSplit by Lymph Node & Chemotherapy

  17. Survival Curves for All PatientsSplit by Lymph Node Positive & Negative

  18. Four groups from the clustering result: Intermediate (NoChemoPoor) Intermediate (ChemoGood) Good Poor SVM Poor2: NoChemoPoor & Poor Good2: Good & ChemoGood Compute LI(x) & CI(x) Compute LI(x) & CI(x) SVM SVM Poor Intermediate Good Intermediate Nonlinear SVM Classifier82.7% Tenfold Test Correctness

  19. Used five cytological features & tumor size to cluster breast cancer patients into 3 groups: • First categorization of a breast cancer group for which chemotherapy enhances longevity • SVM- based procedure assigns new patients into one of above three survival groups Conclusion • Good–No chemotherapy recommended • Intermediate– Chemotherapy likely to prolong survival • Poor – Chemotherapy may or may not enhance survival • 3 groups have very distinct survival curves

More Related