310 likes | 332 Views
Logical Analysis of Diffuse Large B Cell Lymphoma. Gabriela Alexe 1 , Sorin Alexe 1 , David Axelrod 2 , Peter Hammer 1 , and David Weissmann 3 of RUTCOR(1) and Department of Genetics(2), Rutgers University; and Robert Wood Johnson Medical School(3). This Talk. Lymphoma
E N D
Logical Analysis of Diffuse Large B Cell Lymphoma Gabriela Alexe1, Sorin Alexe1, David Axelrod2, Peter Hammer1, and David Weissmann3 of RUTCOR(1) and Department of Genetics(2), Rutgers University; and Robert Wood Johnson Medical School(3)
This Talk RUTCOR • Lymphoma • Gene Expression Level Analysis • cDNA Microarray • Applied to Diffuse Large B-Cell Lymphoma • Logical Analysis of Data • Discretization/Binarization • Support Sets • Pattern Generation • Theories and Models • Prediction
Lymphoma RUTCOR • Cancer of lymphoid cells • Clonal • Uncontrolled growth • Metastasis • Lymphoma • Diagnosis • Grade
Diffuse Large B Cell Lymphoma (DLBCL) RUTCOR • 31% of non-Hodgkin lymphoma cases • 50% long-term, disease-free survival • Clinical variability • Prognosis & therapy • IPI • Morphology • Gene expression
DNA-RNA Hybridization RUTCOR
Gene Expression Profiling Standard Tumor RUTCOR cDNA microarray analysis
DLBCL & cDNA Microarray Analysis RUTCOR • Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,Alizadeh et al., Nature, Vol 403, pp 503-511 • cDNA microarray data -> unsupervised hierarchical agglomerative clustering • Germinal center signature: 76% survival at 5 years • Activated B cell signature: 16% at 5 years
DLBCL Clustering Germinal center genes Activated B cell genes RUTCOR Each case (patient) is a point in N-dimensional space where N = # of genes
DLBCL Survival by Type RUTCOR
Supervised Learning Classification of DLBCL RUTCOR • Diffuse large B-cell lymphoma prediction by gene-expression profiling and supervised machine learningShipp et al., Nature Medicine, vol 8, p 68-74 • Prognosis of DLBCL • Highly correlated genes -> weighted voting algorithm
Logical Analysis of Data (LAD) RUTCOR • Non-statistical method based on: • Combinatorics • Optimization • Logic • Based on dataset of cases/patients • LAD learns patterns characteristic of classes • Subsets of patients who are +/- for a condition • Collections of patterns are extensible • Predictions
The Problem : Approximation of Hidden Function RUTCOR Dataset HiddenFunction LAD Approximation
Main Components of LAD RUTCOR • Discretization/Binarization • Support Sets • Pattern Generation • Theories and Models • Prediction
Discretization RUTCOR Separating Cutpoints Minimum Set of SeparatingCutpoints
Cutpoints and Support Set RUTCOR • Minimization is NP hard • Numerous powerful methods • Support set: • Cutpoints define a grid in which ideally no cell contains both + and – cases • Cutpoints simplify data and decrease noise
Patterns RUTCOR • Examples: • Gene A > 34 & gene B < 24 & gene C < 2 • Positive and negative patterns • Pattern parameters: • Degree (# of conditions) • Prevalence (# of +/- cases that satisfy it) • Homogeneity (proportion of +/- cases among those it covers) • Best: low degree, large prevalence, high homogeneity • Patterns are extensible!
Pattern Generation RUTCOR • Generate patterns based on learning set • Stipulate control parameters. For example: • Degree £ 4 • + & - prevalences >= 70% • + & - homogeneities = 100% • All 75 patterns in 1.2 seconds on Pentium IV 1 Gz PC • Evaluate set: • Average # of patterns covering each observation • Accuracy applied to evaluation set
Patterns: Illustration RUTCOR Negative Pattern Positive Pattern
Theories: Approximations of the 2 Regions Positive Theory Negative Theory RUTCOR A theory is a set of positive (or negative) patterns such that every positive (or negative) case is covered.
Models RUTCOR • A set of a positive and a negative theory • A good model: • Small number of features (genes) • Patterns are high quality • Low degrees • High prevalences • High homogeneities • Number of patterns is small • Maximize their biologic interpretability
Theories and Models RUTCOR Unexplained Area Positive Theory Negative Theory Model Positive Area Discordant Area Negative Area
LAD Prediction RUTCOR • A new case: a set of gene expression levels • Satisfy some positive & no negative? • Satisfy some negative & no positive ? • Satisfy some of both? • Which more? • Does not satisfy any (rare)
8 Gene Classification Model RUTCOR
Accuracy of Prognosis RUTCOR
Conclusion RUTCOR • Logical Analysis of Data (LAD ): a versatile new classification method here applied to diagnosis and prognosis of lymphoma. • LAD genes differ almost entirely from those specified by other studies. • Genes not individually correlated with diagnosis or prognosis but highly correlated in combinations of as few as two genes. • Patterns suggest biologic pathways • LAD provides highly accurateprognosis of DLBCL
Contacts RUTCOR • Gabriela Alexe: galexe@us.ibm.com • Soren Alexe: salexe@rutcor.rutgers.edu • David Axelrod: axelrod@biology.rutgers.edu • Peter Hammer: hammer@rutcor.rutgers.edu • David Weissmann: weissmdj@umdnj.edu