E N D
Logical Analysis Of Data (LAD)Applied To Mass SpectrometryData To Predict Rate Of Decline OfKidney FunctionM. Lipkowitz1, M. Subasi2, E. Subasi2, V. Anbalagan1, W. Zhang1, P.L. Hammer2J. Roboz1 and the AASK Investigators1Mount Sinai School of Medicine, NY, NY2RUTCOR, Rutgers Center for Operations Research, Piscataway, NJDIMACS-RUTCOR Workshop on Boolean andPseudo-Boolean Functions in Memory of Peter L. HammerJanuary, 2009
Acknowledgements • 1,094 Participants • Investigators and Staff at 21 AASK Clinical Centers and Coordinating Center • Sponsors • NIDDK • NIH Office on Research in Minority Health • King Pharmaceuticals
Prevalence of Renal Disease in US(Age > 20 yrs, NHANES III) Creat > 1.3-1.4 (men) Creat > 1.1-1.2 (women) ESRD 300,000 Severe CKD GFR 15-29 400,000 Moderate CKD GFR 30-59 7-12 million Mild CKD GFR 60-89 55 million Normal GFR > 90 114 million Adapted from: Coresh et al, AJKD 41:1-12, 2003
Risk of Death and Cardiovascular Disease in CKD Go et al. N Engl J Med 2004;351:1296-305.
African American Study of Kidney Disease and Hypertension(AASK) Motivated by the high incidence of kidney disease in African Americans with hypertension Extremely hard to recruit 500,000 medical records screened to recruit 1094 participants
Two Phases of AASK • Phase 1: Randomized trial (completed Sept 2001) • 1,094 African-Americans with non-diabetic, hypertensive CKD (baseline GFR of 20-65 ml/min/1.73 m2 • Demonstrated that one class of BP medications, ACE inhibitor, slowed progression of kidney disease • Phase 2: Observational cohort (completed June 2007) • One Objective: document the long-term effects of trial interventions on CKD events • Therapy: all participants received recommended BP therapy: • ACEi (or ARB) • BP goal < 130/80 mmHg
Main Results of Phase 1 • Trial results published in JAMA 2002 • ACEi was more effective than CCBs and BBs in slowing progression of hypertensive renal disease • Largest difference seen in participants with UP/Cr > 0.22 (>300 mg/24h) • No difference between participants randomized to lower MAP goal <92 mmHg vs 102-107 mmHg regardless of UP/Cr
Second Phase of AASK Cohort study (completed 6/07) • One Objective: document the long-term effects of trial interventions on CKD events • Therapy: all participants received recommended BP therapy: • ACEi (or ARB) • BP goal < 130/80 mmHg • Primary composite outcome: • doubling of serum Cr from the trial baseline, ESRD, or death across both trial and cohort phase
Conclusion: ACE inhibition does slow progression of CKD.However, the residual progression rate on best therapy is unacceptable!
Heterogeneity of Progression of CKD Glomerular Filtration Rate (GFR) A measure of kidney function Normal is 100ml/min/1.73 m2 GFR slope We use rate of decline of GFR as our main measure of progression
Clinical Case 1 ACEi Good BP control 1 gm proteinuria
Clinical Case 2 Blood Pressure eGFR ACEi Sub-optimal BP Control Uprot 1.1 g/24 h
How do we find the “Rapid Progressors” and “Non-progressors”
A Serum Proteomics Approach • Use SELDI-tof Mass Spectrometry to detect serum proteins • Use Logical Analysis of Data (LAD), a special data analysis methodology which combines ideas and concepts from optimization, combinatorics, and Boolean functions
The Data Set *Matched for randomized drug class
SELDI Data insulin
Logical Analysis of Data (LAD) Non-statistical method based on Combinatorics Optimization Logic Initiated by Peter L. Hammer in 1988. Has been applied to numerous disciplines: economics and business, seismology, oil exploration, medicine.
LAD Approximation Dataset Hidden Function LAD Approximation
Main Components of LAD Discretization Support set Pattern generation Model Prediction
Discretization Feasible set of cut-points Minimum set of cut-points Set covering
Support Set • Smallest (cardinality) subset of attributes which are sufficient to distinguish between the positive and negative observations. • Finding a support set is a set-covering problem!
Patterns Negative Pattern Positive Pattern
Pattern Characteristics Positive Pattern Covering A: i) Covers A ii) Does not cover D, E, F Coverage(P)= Number of observations covered by P Degree(P)= Number of conditions in P Homogeneity(P) = Proportion of positive observation among those it covers Prevalence(P)= Proportion of positive observations covered by P to total number of positive observations
Positive Theory Negative Theory Theory
LAD Model Unexplained Area Positive area Negative area Discordant Area
A good LAD Model! Small # of features High quality patterns Small degree High prevalence High homogeneity Small # of patterns
LAD Prediction • Model: P1, P2, … , Pp ; N1, N2 , … , Nn • Discriminant • Prediction: • Based on the sign of the discriminant. • Discriminant is not only used for prediction, • but also as an effective risk score!
LAD Softwares Sorin Alexe, Datascope http://rutcor.rutgers.edu/~salexe/LAD_kit/SETUP-LAD-DS-SE20.zip Pierre Lemaire, Ladoscope http://www.kamick.org/lemaire/LAD
LAD Applied to AASK Data • Generates groups of “combinatorial biomarkers” • Pairs of SELDI peak intensities that are either “positive” (predict rapid progression) or “negative” (predict slow progression) biomarkers • Groups of these “combinatorial biomarkers” are combined to create a model that predicts outcomes • There are a small number of pairs of peaks potentially provides targets for future research
The ‘Support Set’ • 5751 SELDI protein peaks • 7 are enough to predict outcomes
Validation of the LAD Model • “10-folding” experiments: • patients randomly divided into 10 equal groups • use data from 9 groups to predict outcomes in 10th • repeat for each group • randomly re-divide and repeat X 10 (100 total runs)
Outcomes by Quintile of “Risk Score” LAD Upro/UCr
LAD vs Proteinuria to Predict Progression • Both work well to find rapid progressors • >95% of patients with high risk or high protein progress • LAD Risk Score better defines slow progressors • None with lowest LAD risk score progress • 16% with lowest protein progress • In fact, the degree of proteinuria in the 3 lowest quintiles may not be distinguishable on repeated testing, so progression could be up to 40%
Future Studies • Expand this pilot SELDI study to the full AASK data set (800 samples). • If data are reproducible this could lead to a clinical test for progression rate. • The ultimate goal: isolate and identify components of combinatorial biomarkers • This will hopefully lead to new therapeutic targets for drug development • Identification of proteins is difficult, and LAD limits the number to identify