310 likes | 332 Views
Explore the logical analysis of diffuse large B-cell lymphoma using gene expression level analysis and cDNA microarray. Understand how unsupervised hierarchical clustering and gene expression profiling aid in prognosis and therapy. Discover the application of logical analysis of data, including discretization and pattern generation, in predicting lymphoma outcomes. Learn how LAD provides highly accurate prognosis information for DLBCL.
E N D
Logical Analysis of Diffuse Large B Cell Lymphoma Gabriela Alexe1, Sorin Alexe1, David Axelrod2, Peter Hammer1, and David Weissmann3 of RUTCOR(1) and Department of Genetics(2), Rutgers University; and Robert Wood Johnson Medical School(3)
This Talk RUTCOR • Lymphoma • Gene Expression Level Analysis • cDNA Microarray • Applied to Diffuse Large B-Cell Lymphoma • Logical Analysis of Data • Discretization/Binarization • Support Sets • Pattern Generation • Theories and Models • Prediction
Lymphoma RUTCOR • Cancer of lymphoid cells • Clonal • Uncontrolled growth • Metastasis • Lymphoma • Diagnosis • Grade
Diffuse Large B Cell Lymphoma (DLBCL) RUTCOR • 31% of non-Hodgkin lymphoma cases • 50% long-term, disease-free survival • Clinical variability • Prognosis & therapy • IPI • Morphology • Gene expression
DNA-RNA Hybridization RUTCOR
Gene Expression Profiling Standard Tumor RUTCOR cDNA microarray analysis
DLBCL & cDNA Microarray Analysis RUTCOR • Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,Alizadeh et al., Nature, Vol 403, pp 503-511 • cDNA microarray data -> unsupervised hierarchical agglomerative clustering • Germinal center signature: 76% survival at 5 years • Activated B cell signature: 16% at 5 years
DLBCL Clustering Germinal center genes Activated B cell genes RUTCOR Each case (patient) is a point in N-dimensional space where N = # of genes
DLBCL Survival by Type RUTCOR
Supervised Learning Classification of DLBCL RUTCOR • Diffuse large B-cell lymphoma prediction by gene-expression profiling and supervised machine learningShipp et al., Nature Medicine, vol 8, p 68-74 • Prognosis of DLBCL • Highly correlated genes -> weighted voting algorithm
Logical Analysis of Data (LAD) RUTCOR • Non-statistical method based on: • Combinatorics • Optimization • Logic • Based on dataset of cases/patients • LAD learns patterns characteristic of classes • Subsets of patients who are +/- for a condition • Collections of patterns are extensible • Predictions
The Problem : Approximation of Hidden Function RUTCOR Dataset HiddenFunction LAD Approximation
Main Components of LAD RUTCOR • Discretization/Binarization • Support Sets • Pattern Generation • Theories and Models • Prediction
Discretization RUTCOR Separating Cutpoints Minimum Set of SeparatingCutpoints
Cutpoints and Support Set RUTCOR • Minimization is NP hard • Numerous powerful methods • Support set: • Cutpoints define a grid in which ideally no cell contains both + and – cases • Cutpoints simplify data and decrease noise
Patterns RUTCOR • Examples: • Gene A > 34 & gene B < 24 & gene C < 2 • Positive and negative patterns • Pattern parameters: • Degree (# of conditions) • Prevalence (# of +/- cases that satisfy it) • Homogeneity (proportion of +/- cases among those it covers) • Best: low degree, large prevalence, high homogeneity • Patterns are extensible!
Pattern Generation RUTCOR • Generate patterns based on learning set • Stipulate control parameters. For example: • Degree £ 4 • + & - prevalences >= 70% • + & - homogeneities = 100% • All 75 patterns in 1.2 seconds on Pentium IV 1 Gz PC • Evaluate set: • Average # of patterns covering each observation • Accuracy applied to evaluation set
Patterns: Illustration RUTCOR Negative Pattern Positive Pattern
Theories: Approximations of the 2 Regions Positive Theory Negative Theory RUTCOR A theory is a set of positive (or negative) patterns such that every positive (or negative) case is covered.
Models RUTCOR • A set of a positive and a negative theory • A good model: • Small number of features (genes) • Patterns are high quality • Low degrees • High prevalences • High homogeneities • Number of patterns is small • Maximize their biologic interpretability
Theories and Models RUTCOR Unexplained Area Positive Theory Negative Theory Model Positive Area Discordant Area Negative Area
LAD Prediction RUTCOR • A new case: a set of gene expression levels • Satisfy some positive & no negative? • Satisfy some negative & no positive ? • Satisfy some of both? • Which more? • Does not satisfy any (rare)
8 Gene Classification Model RUTCOR
Accuracy of Prognosis RUTCOR
Conclusion RUTCOR • Logical Analysis of Data (LAD ): a versatile new classification method here applied to diagnosis and prognosis of lymphoma. • LAD genes differ almost entirely from those specified by other studies. • Genes not individually correlated with diagnosis or prognosis but highly correlated in combinations of as few as two genes. • Patterns suggest biologic pathways • LAD provides highly accurateprognosis of DLBCL
Contacts RUTCOR • Gabriela Alexe: galexe@us.ibm.com • Soren Alexe: salexe@rutcor.rutgers.edu • David Axelrod: axelrod@biology.rutgers.edu • Peter Hammer: hammer@rutcor.rutgers.edu • David Weissmann: weissmdj@umdnj.edu