1 / 23

Knowledge-Based Breast Cancer Prognosis

Computation and Informatics in Biology and Medicine Training Program Annual Retreat October 13, 2006. Knowledge-Based Breast Cancer Prognosis. Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison. Objectives.

powa
Download Presentation

Knowledge-Based Breast Cancer Prognosis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computation and Informatics in Biology and Medicine Training Program Annual Retreat October 13, 2006 Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison

  2. Objectives • Primary objective: Incorporate prior knowledge over completely arbitrary sets into: • function approximation, and • classification • without transforming (kernelizing) the knowledge • Secondary objective: Achieve transparency of the prior knowledge for practical applications • Use prior knowledge to improve accuracy on two difficult breast cancer prognosis problems

  3. Classification and Function Approximation • Given a set of m points in n-dimensional real space Rn with corresponding labels • Labels in {+1, -1} for classification problems • Labels in R for approximation problems • Points are represented by rows of a matrix A2Rm£n • Corresponding labels or function values are given by a vector y • Classification: y2 {+1, -1}m • Approximation: y 2 Rm • Find a function f(Ai) = yibased on the given data points Ai • f : Rn ! {+1, -1} for classification • f : Rn ! R for approximation

  4. Graphical Example with no Prior Knowledge Incorporated +  + + +  +  + +  +     K(x0, B0)u = 

  5. Classification and Function Approximation • Problem: utilizing only given data may result in a poor classifier or approximation • Points may be noisy • Sampling may be costly • Solution: use prior knowledge to improve the classifier or approximation

  6. h1(x) · 0 g(x) · 0 h2(x) · 0 Graphical Example with Prior Knowledge Incorporated +  + + +  +  + +  +  K(x0, B0)u =     Similar approach for approximation

  7. Kernel Machines • Approximate f by a nonlinear kernel function K using parameters u 2Rk and  in R • A kernel function is a nonlinear generalization of scalar product • f(x) K(x0, B0)u-, x 2Rn, K:Rn£Rn£k!Rk • B 2Rk£nis a basis matrix • Usually, B = A 2Rm£n= Input data matrix • In Reduced Support Vector Machines, B is a small subset of the rows of A • B may be any matrix with n columns

  8. Kernel Machines • Introduce slack variable sto measure error in classification or approximation • Error s in kernel approximation of given data: • -sK(A, B0)u-e-ys, e is a vector of ones in Rm • Function approximation: f(x) K(x0,B0)u- • Error s in kernel classification of given data • K(A+, B0)u -e + s+¸ e, s+¸ 0 • K(A-, B0)u -e -s- -e, s-¸ 0 • More succinctly, let: D = diag(y), the m£m matrix with diagonal y of § 1’s, then: • D(K(A, B0)u-e) + s¸ e, s¸0 • Classifier: f(x) sign(K(x0,B0)u-)

  9. Kernel Machines in Approximation OR Classification Positive parameter  controls trade off between • solution complexity: e0a = ||u||1 at solution • data fitting: e0s = ||s||1 at solution OR

  10. Nonlinear Prior Knowledge in Function Approximation • Start with arbitrary nonlinear knowledge implication • g, h are arbitrary functions on  • g:! Rk, h:! R • g(x)  0K(x0, B0)u-h(x), 8x2 ½ Rn • Linear in v, u,  9v¸ 0: v0g(x) +K(x0, B0)u--h(x) ¸ 0 8x2

  11. Theorem of the Alternative for Convex Functions • Assume that g(x), K(x0, B0)u-,-h(x) are convex functions of x, that  is convex and 9x2 : g(x) < 0. Then either: • I. g(x)  0, K(x0, B0)u--h(x) < 0 has a solution x,or • II. vRk, v 0: K(x0, B0)u--h(x) + v0g(x)  0 x • But never both. • If we can find v 0: K(x0, B0)u--h(x) + v0g(x)  0 x, then by above theorem • g(x)  0, K(x0, B0)u--h(x) < 0 hasno solution x or equivalently: • g(x)  0 K(x0, B0)u-h(x), 8x2

  12. Incorporating Prior Knowledge Linear semi-infinite program: infinite number of constraints Add term in objective to drive prior knowledge error to zero Discretize to obtain a finite linear program Slacks zi allow knowledge to be satisfied inexactly at the point xi g(xi) · 0 )K(xi0, B0)u - ¸h(xi), i = 1, …, k

  13. Incorporating Prior Knowledge in Classification (Very Similar) • Implication for positive region • g(x)  0K(x0, B0)u-, 8x2 ½ Rn • 9v ¸0, K(x0, B0)u-- + v0g(x) ¸ 0, 8x2 • Similar implication for negative regions • Add discretized constraints to linear program

  14. Incorporating Prior Knowledge in Classification

  15. 100 100 Checkerboard Dataset:Black and White Points in R2 Classifier based on the 16 points at the center of each square and no prior knowledge Prior knowledge given at 100 points in the two left-most squares of the bottom row Perfect classifier based on the same 16 points and the prior knowledge

  16. Predicting Lymph Node Metastasis as a Function of Tumor Size • Number of metastasized lymph nodes is an important prognostic indicator for breast cancer recurrence • Determined by surgery in addition to the removal of the tumor • Optional procedure especially if tumor size is small • Wisconsin Prognostic Breast Cancer (WPBC) data • Lymph node metastasis and tumor size for 194 patients • Task: predict the number of metastasized lymph nodes given tumor size alone

  17. Predicting Lymph Node Metastasis • Split data into two portions • Past data: 20% used to find prior knowledge • Present data: 80% used to evaluate performance • Simulates acquiring prior knowledge from an expert

  18. Prior Knowledge for Lymph Node Metastasis as a Function of Tumor Size • Generate prior knowledge by fitting past data: • h(x) := K(x0, B0)u- • B is the matrix of the past data points • Use density estimation to decide where to enforce knowledge • p(x) is the empirical density of the past data • Prior knowledge utilized on approximating function f(x): • Number of metastasized lymph nodes is greater than the predicted value on past data, with tolerance of 1% • p(x)  0.1 f(x) ¸h(x) - 0.01

  19. Predicting Lymph Node Metastasis: Results • RMSE: root-mean-squared-error • LOO: leave-one-out error • Improvement due to knowledge: 14.9%

  20. Predicting Breast Cancer Recurrence Within 24 Months • Wisconsin Prognostic Breast Cancer (WPBC) dataset • 155 patients monitored for recurrence within 24 months • 30 cytological features • 2 histological features: number of metastasized lymph nodes and tumor size • Predict whether or not a patient remains cancer free after 24 months • 82% of patients remain disease free • 86% accuracy (Bennett, 1992) best previously attained • Prior knowledge allows us to incorporate additional information to improve accuracy

  21. Generating WPBC Prior Knowledge • Gray regions indicate areas where g(x) · 0 • Simulate oncological surgeon’s advice about recurrence • Knowledge imposed at dataset points inside given regions Number of Metastasized Lymph Nodes • Recur • Cancer free Tumor Size in Centimeters

  22. WPBC Results 49.7 % improvement due to knowledge 35.7 % improvement over best previous predictor

  23. Conclusion • General nonlinear prior knowledge incorporated into kernel classification and approximation • Implemented as linear inequalities in a linear programming problem • Knowledge appears transparently • Demonstrated effectiveness of nonlinear prior knowledge on two real world problems from breast cancer prognosis • Future work • Prior knowledge with more general implications • User-friendly interface for knowledge specification • More information • http://www.cs.wisc.edu/~olvi/ • http://www.cs.wisc.edu/~wildt/

More Related