1 / 48

Introduction to HST 951 Medical Decision Support

Introduction to HST 951 Medical Decision Support. Lucila Ohno-Machado, MD, PhD machado@dsg.harvard.edu Division of Health Sciences and Technology Harvard Medical School Massachusetts Institute of Technology. Welcome. Objectives Provide a practical approach to medical decision support

pello
Download Presentation

Introduction to HST 951 Medical Decision Support

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to HST 951Medical Decision Support Lucila Ohno-Machado, MD, PhD machado@dsg.harvard.edu Division of Health Sciences and Technology Harvard Medical School Massachusetts Institute of Technology

  2. Welcome Objectives • Provide a practical approach to medical decision support • Put a strong emphasis on computer-based applications that utilize concepts from the fields of artificial intelligence and statistics • Focus on principled predictive modeling in biomedicine Audience • Background in quantitative methods is desirable • Undergraduates • Graduate students and post-doctoral fellows (MDs) in medical informatics

  3. Decision Support Cycle Model Selection Data Pre-Processing Goals System Evaluation Model Construction

  4. Types of Models What type of support is needed? • “Exploratory analysis” • “Confirmatory analysis” (gold-standard) • Clustering • Classification

  5. S S S Logistic Regression Inputs Output Age 34 .5 0.6 .4 S Gender 1 “Probability of cancer” .8 4 Mitoses Independent Coefficients Prediction Inputs variables p = 1 .6 Age 34 .4 1 + e S -( + cte ) .2 0.6 .5 .1 Gender 2 .2 .3 .8 “Probability .7 of Cancer” 4 .2 Mitoses CART Neural Networks Rough Sets Models

  6. Requirements, Strengths and Weaknesses, Application Examples • Naïve Bayes • Bayesian Networks • Logistic Regression • Neural Networks • Classification Trees • Rough Set Models • Support Vector Machines • Clustering (Hierarchical and Partitioning)

  7. Evaluation and Comparisons Classification • Calibration (plots, goodness-of-fit) • Discrimination (ROC areas) • Explanation (variable selection) • Outliers, influential observations (case selection) Clustering • Distance metrics • Homogeneity • Inter-cluster distance

  8. D nl threshold Sensitivity = 40/50 = .8 Specificity = 40/50 = .8 “nl” 10 50 40 “D” 50 10 40 50 50 nl disease TP TN FN FP 1.0 1.7 3.0

  9. D nl “nl” 0 40 40 Threshold 1.4 1 ROC curve “D” 60 10 50 50 50 D nl Sensitivity “nl” 10 50 40 Threshold 1.7 “D” 50 10 40 50 50 D nl “nl” 20 70 Threshold 2.0 50 0 1 1 - Specificity “D” 30 0 30 50 50

  10. ROC Curves 1 0.9 0.8 0.7 0.6 1-Specificity LR 0.5 NN 0.4 RS 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sensitivity

  11. overestimation 1 Calibration Curves Sum of system’s estimates 0 1 Sum of real outcomes

  12. Important Topics • Decision Analysis • Cost-effectiveness analysis • Design of Experiments • Real-World Applications • Blocking inferences: quantifying anonymity

  13. Examples of Projects

  14. Students have worked in the past in different domains • Diagnosis of • Coronary Artery Disease • Breast Cancer • Melanoma • Prognosis in • Interventional Cardiology • Spinal Cord Injury • AIDS • Pregnancy

  15. Data Mining and Predictive Modeling in (Bio) Medical Databases

  16. 0.91 0.45 0.89 0.4 0.87 0.35 0.85 0.3 Area under ROC 0.83 0.25 balance 0.81 0.2 0.79 0.15 0.8 0.77 0.1 y = e-(X) Logistic Regression 0.75 0.05 1 2 3 4 5 6 year Logistic Neural Net We emphasize comparison of different models

  17. Modeling the Risk of Major In-Hospital Complications Following Percutaneous Coronary Interventions Frederic S. Resnic, Lucila Ohno-Machado, Gavin J. Blake, Jimmy Pavliska, Andrew Selwyn, Jeffrey J. Popma ACC, 2000

  18. Methods • Consecutive BWH patients, 1/97 through 2/99 • randomly divided into training (n = 1,877) and • test (n = 927) sets • Outcomes: death and combined death, CABG or MI (MACE) • Validation using independent dataset: 3/99 - 12/99 • (n = 1,460)

  19. Dataset: Attributes History Presentation Angiographic Procedural Operator/Lab age acute MI occluded number lesions annual volume gender primary lesion type multivessel device experience diabetes rescue (A,B1,B2,C) number stents daily volume iddm CHF class graft lesion stent types (8) lab device history CABG angina class vessel treated closure device experience Baseline creatinine Cardiogenic shock ostial gp 2b3a antagonists unscheduled case CRI failed CABG dissection post ESRD rotablator hyperlipidemia atherectomy angiojet max pre stenosis max post stenosis no reflow Data Source: Medical Record Clinician Derived

  20. Study Population Development Set Validation Set p=.066 p=.340 p=.311 p=.214 p=.058 p=.298 p<.001 p=.110 p=.739 1/97-2/99 3/99-12/99 Cases 2,804 1,460 Women 909 (32.4%) 433 (29.7%) Age > 74yrs 595 (21.2%) 308 (22.5%) Acute MI 250 (8.9%) 144 (9.9%) Primary 156 (5.6%) 95 (6.5%) Shock 62 (2.2%) 20 (1.4%) Class 3/4 CHF 176 (6.3%) 80 (5.5%) gp IIb/IIIa antagonist 1,005 (35.8%) 777 (53.2%) Death 67 (2.4%) 24 (1.6%) Death, MI, CABG (MACE) 177 (6.3%) 96 (6.6%)

  21. Inputs Output Age 34 .5 0.6 S .4 Gender 1 “Probability of cancer” .8 4 Mitoses Independent Coefficients Prediction variables p = 1 1 + e S -( + cte ) Logistic Regression Logistic regression These models are based on statistics and can only discover linear relationships among the data

  22. Complications in Coronary Intervention age IDDM 0.6 CHF class S type Probability of complication number procedure

  23. Age > 74yrs B2/C Lesion Acute MI Class 3/4 CHF Left main PCI IIb/IIIa Use Stent Use Cardiogenic Shock Unstable Angina Tachycardic Chronic Renal Insuf. Logistic and Score Models for Death Logistic Regression Model Odds Ratio p-value 0.02 2.51 0.05 2.12 0.13 2.06 0.00 8.41 0.03 5.93 0.20 0.57 0.12 0.53 0.00 7.53 0.17 1.70 0.04 2.78 0.06 2.58

  24. Age > 74yrs B2/C Lesion Acute MI Class 3/4 CHF Left main PCI IIb/IIIa Use Stent Use Cardiogenic Shock Unstable Angina Tachycardic Chronic Renal Insuf. Logistic and Score Models for Death Logistic Regression Model Prognostic Risk Score Model beta Risk Odds Ratio p-value coefficient Value 0.02 0.921 2 2.51 0.05 0.752 1 2.12 0.13 0.724 1 2.06 0.00 2.129 4 8.41 0.03 1.779 3 5.93 0.20 -0.554 -1 0.57 0.12 -0.626 -1 0.53 0.00 2.019 4 7.53 0.17 0.531 1 1.70 0.04 1.022 2 2.78 0.06 0.948 2 2.58

  25. Neural Network Inputs .6 Age 34 .4 .2 S 0.6 .5 .1 S Gender 2 .2 .3 .8 “Probability S .7 of Cancer” 4 .2 Mitoses Weights Hidden Weights Dependent Independent Layer variable variables Prediction Neural networks These are mathematical models that can discover non-linear relationships among the data

  26. Neural networks for predicting death and complications age IDDM disease free CHF class death type other complications number procedure

  27. Death ModelsValidation Set: 1460 Cases ROC Area LR: 0.840 Score: 0.855 aNN: 0.835 ROC = 0.50

  28. Risk Score of Death: BWH ExperienceUnadjusted Overall Mortality Rate = 2.1% Number of Cases 62% Mortality Risk 26% 7.6% 1.4% 2.9% 0.4% 1.6% 1.3%

  29. CART Regression Trees These are models that partition the data using one variable at a time, and can model non-linear relationships among data

  30. Diagnosis of Melanoma(Michael Binder, Greg Sharp et al., 1999)

  31. Dermatoscopy

  32. asymmetry < 2 color border A R detail <2 border detail Y < 2 detail detail “malig” “benigh” Y > 10 “malig” “benign” Dermatoscopy

  33. Performance using ABCD rule

  34. Rough Sets Rough SetsThese are mathematical models that derive rules for grouping cases based on boolean logic

  35. Multiple subsamples of a large table are created and combined for rule extraction Rules If [(number>2) and …] then Complication = true

  36. Comparison of Practical Prediction Models for Ambulation Following Spinal Cord Injury(Rowland et al, 1998)

  37. Study PopulationSpinal Cord Injury Model Systems of Care Database • Admitted to one of 24 federally funded designated regional SCI care systems • 17,861 patients who sustained a spinal cord injury between 1973 and 1997 • 1755 patients had data for LEMS scores, 1993 to 1997 • 1138 had complete data for variables of interest

  38. Admission Info (9 items) system days injury days age gender racial/ethnic group level of neurologic fxn ASIA impairment index UEMS LEMS Ambulation(1 item) yes no SCI Mortality NN DesignInput & Output

  39. Results: ROC Curve Area

  40. Results: ROC Curves

  41. Other methodsSupport Vector Machines, multiple variations of the nearest neighbor algorithm, etc.

  42. Heart Attack Alert Program(Wang et al., 2001)

  43. Cox’s Models for Prediction time (years)

  44. Genetic Algorithms Search mechanism • Used for variable selection (model construction) • Case selection (regression diagnostics) • Multidisorder diagnosis

  45. People • Brigham and Women’s Hospital • Children’s Hospital • EECS MIT • School of Public Health • Partners Information Systems

  46. Administrivia Grading based on • 30% homeworks (almost every week)/participation • 30% midterm, open notes • 40% project (no final exam) Lectures on the WWW for reference Handouts with Prof. Szolovits’ assistant at NE-43 r416

  47. Questions/Suggestions • machado@dsg.harvard.edu • isaac_kohane@harvard.edu • psz@mit.edu

More Related