260 likes | 450 Views
Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003. Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg. Data Mining Institute University of Wisconsin - Madison. Breast Cancer Estimates American Cancer Society & World Health Organization.
E N D
Survival-Time Classification of Breast Cancer Patients and ChemotherapyISMP-2003Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg Data Mining Institute University of Wisconsin - Madison
Breast Cancer Estimates American Cancer Society & World Health Organization • Breast cancer is the most common cancer among women in the United States. • 212,600 new cases of breast cancer will be diagnosed in the United States in 2003: 211,300 in women, 1,300 in men • 40,200 deaths will occur from breast cancer in the United States in 2003: 39,800 in women, 400 in men • WHO estimates: More than 1.2 million people worldwide were diagnosed with breast cancer in 2001 and 0.5 million died from breast cancer in 2000.
Main Difficulty: Cannot carry out comparative tests on human subjects • Our Approach: Classify patients into: Good,Intermediate& Poor groups such that: Key Objective • Identify breast cancer patients for whom chemotherapy prolongs survival time • Similar patients must be treated similarly • Goodgroup does not need chemotherapy • Intermediate group benefits from chemotherapy • Poorgroup not likely to benefit from chemotherapy
Outline • Data description • Tools used • Support vector machines (Linear & Nonlinear SVMs) • Feature selection & classification • Clustering (k-Median algorithm not k-Means) • Cluster into chemo & no-chemo groups • Cluster chemo patients into 2 groups: good & poor • Cluster no-chemo patients into 2 groups: good & poor • Merge into three final classes • Good (No-chemo) • Poor (Chemo) • Intermediate: Remaining patients (chemo & no-chemo) • Generate survival curves for three classes • Use SSVM to classify new patients into one of above three classes
1- Norm Support Vector MachinesMaximize the Margin between Bounding Planes A+ A-
Given m points in n dimensional space • Represented by an m-by-n matrix A • Membership of each in class +1 or –1 specified by: • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, • More succinctly: where e is a vector of ones. Support Vector MachineAlgebra of 2-Category Linearly Separable Case
Feature selection: 1-norm SVM: min s. t. , where , denotes Lymph node > 0 or Lymph node =0 • 5 out 30 cytological features that describenuclear size, shape and texture from fine needle aspirate Feature SelectionUsing 1-Norm Linear SVM Classification Based on Lymph Node Status • Features selected: 6 out of 31 by above SVM: • Tumor size from surgery
Linear SVM: (Linear separating surface: ) : (LP) min s.t. . Maximizing the margin By QP duality: in the “dual space” , gives: min s.t. min • Replace by a nonlinear kernel s.t. Nonlinear SVM for Classifying New Patients
The nonlinear classifier: : • Gaussian (Radial Basis) Kernel represents “similarity” -entryof • The between the data points and The Nonlinear Classifier • Where K is a nonlinear kernel, e.g.:
Clustering in Data Mining General Objective • Given:A dataset ofm points in n-dimensional real space • Problem:Extract hidden distinct properties by clustering the dataset into kclusters
of mpoints in • Given:Set represented by the matrix ,and a number of desired clusters • Find: Cluster centers that minimize the sum of 1-norm distances of each point: to its closest cluster center. • Objective Function:Sum ofm minima of linear functions, hence it ispiecewise-linear concave • Difficulty:Minimizing a general piecewise-linear concave function over a polyhedral set is NP-hard Concave Minimization Formulationof 1-Norm Clustering Problem (k-Median)
Minimize thesum of 1-norm distances between each data and the closest cluster center point : min min s.t. • Equivalent bilinear reformulation: min s.t. Clustering via Finite Concave Minimization
Step 0 (Initialization): Pick2initial cluster centers • (L=0 & T<2) & (L 5 or T 4) Step 1 (Cluster Assignment): Assign points to the cluster with the nearest cluster center in 1-norm Step 2 (Center Update) Recompute location of center for each cluster as the cluster median (closest point to all cluster points in 1-norm) Step3 (Stopping Criterion) Stop if the cluster centers are unchanged,else go toStep 1 K-Median Clustering AlgorithmFinite Termination at Local Solution
) • 6 out of 31 features selected by 1-norm SVM ( • SVM separating lymph node positive (Lymph > 0) from lymph node negative (Lymph = 0) • Poor1: Patients with Lymph > 4 OR Tumor Feature Selection & Initial Cluster Centers • Perform k-Median algorithm in 6-dimensional input space • Initial cluster centers used: Medians of Good1 & Poor1 • Good1: Patients with Lymph = 0AND Tumor < 2 • Typical indicator for chemotherapy
Poor1: Lymph>=5 OR Tumor>=4 Compute Median Using 6 Features Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Compute Initial Cluster Centers Cluster 113 NoChemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Cluster 140 Chemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor 69 NoChemo Good Poor Intermediate Good Overall Clustering Process 253 Patients (113 NoChemo, 140 Chemo)
Survival Curves forGood, Intermediate& Poor Groups(Classified by Nonlinear SSVM)
Survival Curves for Intermediate Group:Split by Chemo & NoChemo
Survival Curves for Overall Patients:With & Without Chemotherapy
Survival Curves for Intermediate GroupSplit by Lymph Node & Chemotherapy
Survival Curves for Overall PatientsSplit by Lymph Node Positive & Negative
Used five cytological features & tumor size to cluster breast cancer patients into 3 groups: • First categorization of a breast cancer group for which chemotherapy enhances longevity • SVM- based procedure assigns new patients into one of above three survival groups Conclusion • Good–No chemotherapy recommended • Intermediate– Chemotherapy likely to prolong survival • Poor – Chemotherapy may or may not enhance survival • 3 groups have very distinct survival curves
Talk & Paper Available on Web • www.cs.wisc.edu/~olvi • Y.-J. Lee, O. L. Mangasarian & W. H. Wolberg: “Computational Optimization and Applications” Volume 25, 2003, pages 151-166”