1 / 27

Survival-Time Classification of Breast Cancer Patients and Chemotherapy

Survival-Time Classification of Breast Cancer Patients and Chemotherapy. Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla. Computational and Applied Mathematics Seminar April 19, 2005. Breast Cancer Estimates American Cancer Society & World Health Organization.

vine
Download Presentation

Survival-Time Classification of Breast Cancer Patients and Chemotherapy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational and Applied Mathematics SeminarApril 19, 2005

  2. Breast Cancer Estimates American Cancer Society & World Health Organization • Breast cancer is the most common cancer among women in the US. • 212,930 new cases of breast cancer are estimated by the ACS to occur in the US in 2005: 211,240 in women and 1,690 in men. • 40,870 deaths are estimated to occur from breast cancer in the US in 2005: 40,410 among women and 460 among men. • WHO estimates: More than 1.2 million people worldwide were diagnosed with breast cancer in 2001 and 0.5 million died from breast cancer in 2000.

  3. Main Difficulty: Cannot carry out comparative tests on human subjects • Our Approach: Classify patients into: Good,Intermediate& Poor groups such that: Key Objective • Identify breast cancer patients for whom chemotherapy prolongs survival time • Similar patients must be treated similarly • Goodgroup does not need chemotherapy • Intermediate group benefits from chemotherapy • Poorgroup not likely to benefit from chemotherapy

  4. Outline • Data description • Tools used • Support vector machines (Linear & Nonlinear SVMs) • Feature selection & classification • Clustering (k-Median algorithm not k-Means) • Cluster into good & intermediate & poor classes • Cluster no-chemo patients into 2 groups: good & poor • Cluster chemo patients into 2 groups:good & poor • Generate three final classes • Good class (Good from no-chemo cluster group) • Poor class (Poor from chemo cluster group) • Intermediate class: Remaining patients (chemo & no-chemo) • Generate survival curves for three classes • Use SSVM to classify new patients into one of above three classes

  5. Cell Nuclei of a Fine Needle Aspirate

  6. Thirty Cytological FeaturesCollected at Diagnosis Time

  7. Two Histological Features Collected at Surgery Time

  8. Breast Cancer Diagnosis Based on 3 FNA Features 97% Ten-fold Cross Validation Corrrectnes780 Patients: 494 Benign, 286 Maignant Research by Mangasarian,Street, Wolberg

  9. 1- Norm Support Vector MachinesMaximize the Margin between Bounding Planes A+ A-

  10. Given m points in n dimensional space • Represented by an m-by-n matrix A • Membership of each in class +1 or –1 specified by: • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, • More succinctly: where e is a vector of ones. Support Vector MachineAlgebra of 2-Category Linearly Separable Case

  11. Feature selection: 1-norm SVM: min s. t. , where , denotes Lymph node > 0 or Lymph node =0 • 5 out 30 cytological features that describenuclear size, shape and texture from fine needle aspirate Feature SelectionUsing 1-Norm Linear SVM Classification Based on Lymph Node Status • Features selected: 6 out of 31 by above SVM: • Tumor size from surgery

  12. Features Selected by Support Vector Machine

  13. Linear SVM: (Linear separating surface: ) : (LP) min s.t. . Maximizing the margin By QP duality: in the “dual space” , gives: min s.t. min • Replace by a nonlinear kernel s.t. Nonlinear SVM for Classifying New Patients

  14. The nonlinear classifier: : • Gaussian (Radial Basis) Kernel represents “similarity” -entryof • The between the data points and The Nonlinear Classifier • Where K is a nonlinear kernel, e.g.:

  15. Clustering in Data Mining General Objective • Given:A dataset ofm points in n-dimensional real space • Problem:Extract hidden distinct properties by clustering the dataset into kclusters

  16. of mpoints in • Given:Set represented by the matrix ,and a number of desired clusters • Find: Cluster centers that minimize the sum of 1-norm distances of each point: to its closest cluster center. • Objective Function:Sum ofm minima of linear functions, hence it ispiecewise-linear concave • Difficulty:Minimizing a general piecewise-linear concave function over a polyhedral set is NP-hard Concave Minimization Formulationof 1-Norm Clustering Problem (k-Median)

  17. Minimize thesum of 1-norm distances between each data and the closest cluster center point : min min s.t. • Equivalent bilinear reformulation: min s.t. Clustering via Finite Concave Minimization

  18. Step 0 (Initialization):Pick2initial cluster centers as medians of: • (L=0 & T<2) & (L 5 or T 4) Step 1 (Cluster Assignment): Assign points to the cluster with the nearest cluster center in 1-norm Step 2 (Center Update) Recompute location of center for each cluster as the cluster median (closest point to all cluster points in 1-norm) Step3 (Stopping Criterion) Stop if the cluster centers are unchanged,else go toStep 1 K-Median Clustering AlgorithmFinite Termination at Local Solution

  19. ) • 6 out of 31 features selected by 1-norm SVM ( • SVM separating lymph node positive (Lymph > 0) from lymph node negative (Lymph = 0) • Poor1: Patients with Lymph > 4 OR Tumor Feature Selection & Initial Cluster Centers • Apply k-Median algorithm in 6-dimensional input space • Initial cluster centers used: Medians of Good1 & Poor1 • Good1: Patients with Lymph = 0AND Tumor < 2 • Typical indicator for chemotherapy

  20. Poor1: Lymph>=5 OR Tumor>=4 Compute Median Using 6 Features Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Compute Initial Cluster Centers Cluster 113 NoChemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Cluster 140 Chemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor 69 NoChemo Good Poor Intermediate Good Overall Clustering Process 253 Patients (113 NoChemo, 140 Chemo)

  21. Survival Curves forGood, Intermediate& Poor Groups(Nonlinear SSVM for New Patients)

  22. Survival Curves for Intermediate Group:Split by Chemo & NoChemo

  23. Survival Curves for Overall Patients:With & Without Chemotherapy

  24. Survival Curves for Intermediate GroupSplit by Lymph Node & Chemotherapy

  25. Survival Curves for Overall PatientsSplit by Lymph Node Positive & Negative

  26. Used five cytological features & tumor size to cluster breast cancer patients into 3 groups: • First categorization of a breast cancer group for which chemotherapy enhances longevity • SVM- based procedure assigns new patients into one of above three survival groups Conclusion • Good–No chemotherapy recommended • Intermediate– Chemotherapy likely to prolong survival • Poor – Chemotherapy may or may not enhance survival • 3 groups have very distinct survival curves

  27. Talk & Paper Available on Web • www.cs.wisc.edu/~olvi • Y.-J. Lee, O. L. Mangasarian & W. H. Wolberg: “Computational Optimization and Applications” Volume 25, 2003, pages 151-166”

More Related