1 / 25

Association Rule Mining in Type-2 Diabetes Risk Prediction

Association Rule Mining in Type-2 Diabetes Risk Prediction. Gyorgy J. Simon Dept. of Health Sciences Research Mayo Clinic. SHARPn Summit 2012. Outline. Introduction Modeling Diabetes Risk Association Rule Mining Results Diabetes Disease Network Reconstruction Diabetes Risk Prediction

tracey
Download Presentation

Association Rule Mining in Type-2 Diabetes Risk Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Association Rule Mining in Type-2 Diabetes Risk Prediction Gyorgy J. Simon Dept. of Health Sciences Research Mayo Clinic SHARPn Summit 2012

  2. Outline • Introduction • Modeling Diabetes Risk • Association Rule Mining • Results • Diabetes Disease Network Reconstruction • Diabetes Risk Prediction • Applicability to SHARP

  3. Diabetes • In the US, 25.8 million people (8% of the population) suffer from Diabetes Mellitus • Type 2 Diabetes Mellitus (DM) • DM leads to significant medical complications • Effective preventive treatments exist • Identifying subpopulations at risk is important • Pre-Diabetes (PreDM) is a condition that precedes DM • fasting glucose 100-125 • Identify sets of risk factors that significantly increase the risk of developing diabetes in a pre-diabetic population • Risk factors: • Co-morbid diseases: obesity, cardiac-, vascular conditions • Vitals, lab test results, medications, co-morbid conditions • 85k Mayo Patients 1999-2004 with research consent

  4. Design Follow-Up Study Period PreDM 23,828 PreDM21,826 2,002 DM 424 DM 19,013 16,664 347 Normal 84,708 Normal 44,156 Normal 43,809 12/31/2004 7/2010 1/1/1999

  5. Data • Follow-up Time (FUT): Time since PreDMDx • Co-morbidities: before elevated glucose measurement • hypertension, hyperlipidemia, obesity, various cardiac and vascular diseases • Age and Follow-up time (FUT) are predictive of DM • They are not modifiable, we need to compensate for them • Goal is different from high-throughput phenotyping • None of the patients have the disease • Predict the risk that patients progress to DM

  6. Outline • Introduction • Modeling Diabetes Risk • Association Rule Mining • Results • Diabetes Disease Network Reconstruction • Diabetes Risk Prediction • Applicability to SHARP

  7. Computational Model Level 1 Unmodifiable “nuisance” factors Unknown Disease Mechanism Sex Age … … Level 2 Clinical factors of interest … … statin Tobacco bmi hdl HTN … Level 3 Glucose “definition” of DM glucose DM Dx Goal Find sets of clinical factors (level 2) that are associated with elevated risk of DM We have to adjust for level 1 factors before we can assess the effect of level 2 factors !

  8. Modeling Approaches • Logistic regression / Survival Analysis • No ability to discover interactions • Decision Trees/RandomForest/Gradient-boosted Trees • Greedy approach to discover interaction • No ability to compensate for age and follow-up time (FUT) • Association Rule Mining (ARM) • Specifically designed to discover interactions • No ability to compensate for age and FUT Regression Analysis + Association Rule Mining Remove the effect of age gender and FUT Find association between the risk factors and the DM risk not explained by age and FUT Simon et al. AMIA 2011

  9. Overview 1st Phase 2nd Phase 3rd Phase O Observed Number of DMincidents R1 = O – E1 1st Phase Residual R2 = O–(E1+E2) = R1-E2 2nd Phase Residual E3 Expected Number of DM incidents based on glucose (after adjusting for everything else) E1 Expected Number of DM incidents based on age and sex only E2 Expected Number of DM incidents based on co-morbidities only (after adjusting for age and sex) Association Rule Mining • Regression modeling • Survival model or • Logistic regression E = E1 + E2 + E3 Final Prediction

  10. Association Rule Mining • Origins from sales data • Items (columns): co-morbid conditions • Transactions (rows): patients • Itemsets: sets of co-morbid conditions • Goal: find allitemsets (sets of conditions) that frequently co-occur in patients. • One of those conditions should be DM. • Support: # of transactions the itemsetI appeared in • Support({OB, HTN, IHD})=3 • Frequent: an itemsetI is frequent, if support(I)>minsup X: infrequent

  11. Distributional Association Rule Mining Distributional Association Rules associate an itemset with a continuous outcome. Frequency R Frequency R • Application to Diabetes • Find all sets I of co-morbid conditions, such that the distribution of risk R is • significantly different between the patient population having I and without I Simon et al, KDD 2011a

  12. Why Association Rule Mining?

  13. Outline • Introduction • Modeling Diabetes Risk • Association Rule Mining • Results • Diabetes Disease Network Reconstruction • 4.5-yr DM Risk Prediction • Applicability to SHARP

  14. Diabetes Disease Network Reconstruction • Metabolic Syndrome: DM + cardiac/vascular diseases • Use Association Rule Mining to map out the relationships between DM and other metabolic syndrome diseases • Also measure their effect on DM progression risk • Predictors: Age, sex, FUT; co-morbid disease Dx • 1st Phase model is survival model • 2nd Phase ARM

  15. Results • 37 Distributional Association Rules were discovered • 11 are significant. • (Poisson test; Bonferroni adjusted 5%) Interpretation: Patients with HTN,OB,IHD and HL have age and FUT adjusted 2.15 RR of DM. Effect of age- and FUT adjustment The entire PreDM population has 8.04% chance of DM. Without age and FUT adjustment, the above population has 61/339=17.9% With age and FUT adjustment, 1-(1-.084)2.15=17.2%

  16. Results Condition(s) Subpop. ( Relative Size Risk ) IHD 2366 (1.16) [p-value .11] HTN, OB, IHD 382 (2.08) HTN, IHD, HL 1210 (1.36) [p-value .015]

  17. Outline • Introduction • Modeling Diabetes Risk • Association Rule Mining • Results • Diabetes disease network re-construction • 4.5-yr DM risk prediction • Applicability to SHARP

  18. DM Progression Risk Prediction • Predicting the probability of progression to DM within 4.5 years • Predictors: age, sex, co-morbid Dx, laboratory results and medication orders • 1st Phase: spline logistic regression to adjust for age and sex • 2nd Phase: ARM • 3rd Phase: linear regression using glucose

  19. Machine Learned Indices • Comparison to machine learning methods • Gradient Boosted Trees (GBM) • 10,000 trees • Linear Model (LM) • Random Forest (RF) • 275-325 trees • Association Rule Mining (ARM) • 100 rules • 10-fold CV repeated 50 times • Same predictive performance but more interpretable model C-statistic

  20. Traditional Indices • Performance similar to San Antonio (Refit) • ARM readily provides a justification as to why the risk is high • Proposed method places the patient on a path in the diabetes network

  21. Clinical Validation • Work in progress… • Apply the rules to both normo-glycemic and Pre-DM patients • Each point is a rule • Patterns similar for lower-risk subpopulations • For high-RR rules, risk of DM is higher for Pre-DM patients

  22. Outline • Introduction • Modeling Diabetes Risk • Association Rule Mining • Results • Interpretability • Predictive Performance • Applicability to SHARP

  23. High-Throughput Phenotyping(HTP) • We can use the Association Rules as a HTP algorithm • Discover the rules with ARM • Validate the rules with an expert clinician

  24. Acknowledgment

  25. References Vemuri P, Simon G, Kantarci K, Whitwell J, Senjem M, Przybelski S, Gunter J, Josephs K, Knopman D, Boeve B, Ferman T, Dickson D, Parisi J, Petersen R and Jack C. Antemortem differential diagnosis of dementia pathology using structural MRI: Differential-STAND. NeuroImage, 2010. Caraballo P, Li P, Simon G. Use of Association Rule-mining to Assess Diabetes Risk in Patients with Impaired Fasting Glucose, AMIA, 2011. Simon G, Kumar V, Li P. A Simple statistical model and association rule filtering. In Proc. ACM International Conference on Data Mining and Knowledge Discovery (KDD), 2011. Simon G. Li P, Jack C, Vemuri P. Understanding Atrophy Trajectories in Alzheimer’s Disease Using Association Rules on MRI images. In Proc. ACM International Conference on Data Mining and Knowledge Discovery (KDD), 2011.

More Related