1 / 77

Marvin Ploetz Philip Docena Ojaswi Pandey Aakash Mohpal 23 April, 2019

MACHINE LEARNING FOR IMPROVED RISK STRATIFICATION OF NCD PATIENTS IN ESTONIA Big Data and Machine Learning in Health Care. Marvin Ploetz Philip Docena Ojaswi Pandey Aakash Mohpal 23 April, 2019. Objectives.

jackm
Download Presentation

Marvin Ploetz Philip Docena Ojaswi Pandey Aakash Mohpal 23 April, 2019

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MACHINE LEARNING FOR IMPROVED RISK STRATIFICATION OF NCD PATIENTS IN ESTONIABig Data and Machine Learning in Health Care Marvin Ploetz Philip Docena Ojaswi Pandey Aakash Mohpal 23 April, 2019

  2. Objectives Propose an alternative - machine learning based - approach to patient risk stratification for ECM in Estonia Illustrate the use and applicability of machine learning to other areas of work relevant to EHIF

  3. Overview Big Data and Machine Learning in Health Care Machine Learning Basics Context of ECM Research Question Data Overview & Sample Construction Feature Engineering Evaluation & Modelling Choices Results Conclusions

  4. Big Data and Machine Learning in Health Care

  5. Big Data and Machine Learning in Health Care Take advantage of massive amounts of data and provide the right intervention to the right patient at the right time Personalized care to the patient Potentially benefit all agents in the health care system: patient, provider, payer, management

  6. Uses of Machine Learning in Health Care Personalized medicine Benefits Right patient Right intervention Right time Patients Providers Payers

  7. Example 1: Hip and knee replacement in the US • Osteoarthritis: a common and painful chronic condition • Often requires replacement of hip and knees • More than 500,000 Medicare beneficiaries receive replacements each year • Medical costs: roughly $15,000 per surgery • Medical benefits: accrue over time, since some months after surgery is painful and spent in disability • Therefore, a joint replacement only makes sense if you will live long enough to enjoy it. If you die soon after, could be futile and painful • Prediction/classification problem: Can we predict which surgeries will be futile using only data available at the time of surgery?

  8. Example 1: Hip and knee replacement in the US 3,305 independent variables Train data 65,395 observations 20% of 7.4 million beneficiaries 98,090 had hip or knee replacement in 2010 Model to predict riskiest patients Test data 32,695 observations 1.4% died within one month of surgery 4.2% died within 1-12 months Traditional Analysis: About Averages Big Data and ML Analytics: Predict Individual Risks

  9. Example 1: Hip and knee replacement in the US The first column sorts the test sample by risk percentiles. In the top 5th percentile riskiest population, the observed mortality rate within 1 year within 1-12 months was 43.5%. Reallocating these surgeries to those with median risk level (50th percentile) would have averted 1,984 futile procedures, and reallocated $30m to other beneficiaries.

  10. Example 2: Diagnoses of pediatric conditions Apply natural language processing algorithms to extract data from EHRs Extract 101.6m data points from 1.3m EHRs of pediatric patients High diagnostic accuracy among multiple organ systems and comparable to performance of experienced pediatric physicians

  11. Example 2: Diagnoses of pediatric conditions

  12. Example 3: Breast cancer screening Most common form of cancer afflicting 2.5 million patients worldwide in 2015 Need to distinguish malignant tumors from benign ones Early detection is key Data: 62,219 mammography findings from the Wisconsin State Cancer Reporting System A Neural Network based algorithm does as well as radiologists in classifying the tumors

  13. Machine Learning Basics

  14. Definition of Big Data • Collection of large and complex data sets which are difficult to process using common database management tools or traditional data processing applications • Not only about size: finding insights from complex, noisy, heterogeneous, and longitudinal data sets • This includes capturing, storing, searching, sharing and analyzing

  15. Types of Machine Learning Problems • Supervised – Making predictions using labeled/structured data • Classification: use data to predict which category something falls into • Examples: If an image contains a store front or not; If a patient is high risk or not • Regression: use data to make predictions on a continuous scale • Examples: Predict stock price of a company; given historical data, what will the temperature be tomorrow • Unsupervised – Detecting patterns from unstructured data • Problems where we have little or no idea what the results should look like • Provide algorithms with data and ask to look for hidden features and cluster the data in a way it makes sense • Examples: identify patterns from genomics data, separating voice from noise in audio files

  16. Machine Learning Implementation Standardize and clean data Build model using train data Split data in test/train Collect data Validate model results using test data Data Model Results Build Machine Learning Model Train data 80% Data Feature engineering/ Data construction Data Test data 20% Data

  17. Assessing Model Performance: Precision and Recall Accuracy = (TP+TN)/All Precision = TP/(TP+FP) Recall = TP/(TP+FN)

  18. Assessing Model Performance: Precision and Recall Case I: High recall, low precision Case II: Low recall, high precision • Accuracy = 190/230 = 83% • Precision = 90/95 = 95% • Recall = 90/125 = 72% Accuracy = 150/165 = 78% Precision = 100/145 = 69% Recall = 100/105 = 95%

  19. Assessing Model Performance: ROC Curve Plot the true and false positive rate for every classification threshold A perfect model has a curve that passes through the upper left corner (AUC = 1) The diagonal (red line) represents random guessing (AUC = 0.5)

  20. Decision Tree: Playing Golf • A non-parametric supervised learning method used for classification and regression • Built in the form a tree structure • Breaks data down in smaller and smaller subsets while incrementally building tree • Final result is tree with decision nodes and leaf nodes

  21. Decision Tree: Playing Golf Outlook Rainy Overcast Sunny No Golf Golf Windy False True Play Golf No Golf

  22. Decision tree to Random Forest • A collection of decision trees whose results are aggregated into one final output • Use different sub-samples of the data and different set of features • Helps reduce overfitting, bias and variance

  23. Context of ECM

  24. A Big Challenge of the Estonian Healthcare System Changes in the demand for health care due to population ageing and rise of non-communicable diseases Chronic conditions as the driving force behind needs for better care integration Low coverage of preventive services and considerable share of avoidable specialist and hospital care Opportunity to improve management of specific patient groups at the PHC level -> care management for empaneled patients Prediction for which patients breaches in care coordination will occur -> risk-stratification of patients

  25. DM/ Hypertension/ Hyperlipidemia Risk Stratification Until Now No Yes Not eligible Min. and Max. Number/Combination of: CVD/ Respiratory/ Mental Health/ Functional Impairment No Yes Not eligible Dominant/complex condition (cancer, schizophrenia, rare disease etc.) No Yes Review by GPs (Behavioral & social factors, information not in data) Not eligible No Yes Not eligible ECM Candidate No actual prediction analysis done Involvement of providers to gain trust/understanding Behavioral and social criteria are key, but sparsely available -> use insider knowledge of doctors

  26. Enhanced Care Management So Far In Estonia Successful enhanced care management pilot with 15 GPs and < 1,000 patients to assess the feasibility and acceptability of enhanced care management Commitment of the Estonian Health Insurance Fund (EHIF) to scale-up the care management pilot Model for risk stratification: - Clinical algorithm + provider intuition Need for a better risk-stratification approach!?

  27. Research Question

  28. The Prediction Problem Target patients - Who benefits from care management? A combination of disease, social and behavioral factors… Objective of ECM -Ultimately improve health outcomes for patients with cardio-vascular, respiratory, and mental disease. What is the right proxy prediction variable in the data? There is not one single relevant adverse event (e.g. death, hospital admission, health complication, high healthcare spending) Some discussions on how to choose the dependent variable… -> Unplanned hospital admissions have a large negative impact on patient lives, are costly and relatively frequent. Some are also avoidable…

  29. Many Patients Repeatedly Have Hospitalizations 22 percent of patients need to be hospitalized again in the following year…

  30. Hospitalizations account for a bulk of healthcare costs

  31. Predicting Hospital Admissions • Hospital admissions are the main (avoidable) adverse health event • But predicting hospitalizations is a hard problem • Social factors matter a lot, patients may have a lot or no contacts with the healthcare systems at all… • Tradeoff to choose which hospitalizations we want to predict • Admissions due to specific conditions vs. hospitalizations in general

  32. Predicting Hospital Admissions Key question Not “What is the best algorithm for predicting hospital admissions?” But “How can we obtain the most useful prediction of hospital admissions for a specific purpose?”

  33. Data Overview & Sample Construction

  34. Administrative Claims Data (in Estonia) Very reliable High-quality data availability as of 2007/2008 Comprehensive coding requirements for providers Reporting lag of data is on average 2 weeks No info on clinical outcomes (i.e. test results) Limited information on social conditions and behavioral characteristics Need for a lot of feature engineering to create “meaningful” variables at the patient level

  35. Description of Available Data

  36. Patient Cohort Selection for the ML Analysis

  37. Characteristics of Patients in the ML sample vs. Total Population • Relative to the population, the ML sample is older and more likely to be female.

  38. Characteristics of Patients in the ML sample vs. Total Population

  39. Most Common Chronic Conditions The ML Sample population is also more sick on average (i.e. the prevalence of chronic conditions is higher)

  40. Characteristics of Patients in the ML sample vs. Total Population

  41. Feature Selection & Engineering

  42. Feature Selection & Engineering Series of attempts with interim features to extract better performance… Final set: 141 features

  43. Features Used…

  44. Features Used…

  45. Features Used…

  46. Getting to Know the Data: Diagnosis and Admissions Single DGN Pairs of DGNs Afib (Atrial Fibrillation And Flutter), Chf (Congestive Heart Failure), Htn (Hypertension), and Ischemic Htd (Ischemic Heart Disease) are strong indicators of potential admissions in the following year (2017) Patient groups with these conditions have a non-trivial (~10% likelihood) of hospital admissions This likelihood increases to ~20%-~30% with one 2016 hospital admission and to >50% with 3 and more admissions in 2016

  47. Evaluation & Modelling Choices

  48. ML Models Selected for Evaluation • Selection criteria: • Algorithms are readily available, easy-to-use, comprehensive and well-tested open-source libraries in Python (scikit) • Algorithms and results are relatively easy to describe/explain (common algorithms) • For interpretability and model familiarity, no attempt at exploring more complex models; no deep networks • Included in comparison: • Decision Tree • Random Forest and Extremely Randomized Trees (ExtraTrees) • k-Nearest Neighbors* • Gaussian Naïve-Bayes** • Logistic Regression (L1, L2) • SVM (RBF, polynomial)*** • Multi-layer Perceptrons (1 hidden layer) • Adaboost (Decision Tree and Random Forest) • Gradient Boosted Trees (scikit GBT, not XGBoost) • Calibrated (isotonic) variations of above classifiers • Neural Networks • Eventually excluded: *kNN for execution time and memory requirements, **NB for weak performance, and ***SVMs for very slow training (but considered for final paper)

  49. Evaluation metrics • Variable to be predicted: Yes/No hospital admission in 2017 • Use data from 2011-2016 • We deal with an unbalanced sample (i.e. 7.5% of patients had an admission in 2017) • Appropriate metrics of model performance in an unbalanced dataset: • Precision, Recall, ROC curve and area under the curve (AUC) • (Problem-specific custom metric to penalize mistakes) for one type of error more heavily: cost of a false positive (cost of ECM) vs. cost of a missed positive (cost of subsequent hospitalization) • Different ML models have different strengths, but differences should not be huge

  50. Intuitive Interpretation of Metrics Precision is the probability that a patient classified as a patient with a hospital admission by an algorithm is actually going to have a hospital admission. Recall is the probability that a patient who is going to have a hospital admission is being classified as such by an algorithm. Which one is more important? It depends a lot on the application. There is a tradeoff between maximizing either of them…

More Related