170 likes | 386 Views
Computational Intelligence in Biomedical and Health Care Informatics HCA 590 (Topics in Health Sciences). Rohit Kate. Data Mining : Sample Medical Applications. Reading.
E N D
Computational Intelligence in Biomedical and Health Care InformaticsHCA 590 (Topics in Health Sciences) Rohit Kate Data Mining: Sample Medical Applications
Reading Data Mining of Medical Data: Opportunities and Challenges in Mining Association RulesDan A. Simovici. International Academy of Life Sciences conference, Cecilienhof, Potsdam, August 2012.
Data Mining Applications in Medicine • Numerous and well established • Evaluating treatment effectiveness • Health care management • The analysis of relationships between patients and providers of care • Pharmacovigilance • Fraud and abuse detection • Limitations • Limited accessibility to medical data • Technical challenges: distributed data (clinical, administrative) • Legal and social challenges: privacy concerns, data ownership • Incomplete or noisy
Adverse Drug Reactions • A serious problem • 5% of hospital admissions • 28% of emergency department visits • 5% of hospital deaths • Loss of several billion dollars each year • Why are they not detected earlier? • Although drugs are thoroughly tested before introducing them in the market, it is not possible to predict: • Long-term effects • Effects in every type of patient • Effects in every combination of other treatments (e.g. every possible drug-drug interaction)
Adverse Drug Reactions Data • Monitored internationally in multiple sites • Uppsala Monitoring Center in Sweden • Unit of World Health Organization (WHO) • Center mines data from case safety reports • Vigibase: Case safety reporting database • Data from 1978, access allowed for a fee • Food and Drug Administration (FDA) • FDA Adverse Event Reporting System (FAERS) database • Formerly AERS • Access is free • http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/default.htm • Various pharma entities maintain proprietary databases • Must record adverse drug reactions by US law
Association Rule Mining for Adverse Drug Reactions • Given a database of adverse drug reactions, the goal is to mine useful patterns from them • What drugs and their combinations usually lead to what single or multiple side-effects? • Association rules are a good format for patterns • Simplest association rule: Vioxx heart attack
Association Rule Mining for Adverse Drug Reactions • Harpaz et al. [2000] did a study of mining association rules from undesirable drug interactions • Based on 162,744 reports of suspected adverse drug reactions from AERS database • Database items are naturally partitioned in two classes: drugs and symptoms and association rules have form X Y where X is a set of drugs and Y is a set of symptoms • This makes the mining algorithm more efficient
Association Rule Mining for Adverse Drug Reactions • Apriori algorithm was applied along with “relative reporting ratio” interestingness measure • 1167 association rules were automatically mined • Sample mined association rules: • metforminmetoprolol NAUSEA 50 7.4 • cyclophosphamide, prednisone, vincristine FEBRILE NEUTROPENIA 78 45 • cyclophosphamide, doxorubicin, prednisone, rituximab FEBRILE NEUTROPENIA 63 59 • atorvastatin, lisinopril DYSPNOEA 55 3.5 • omeprazolesimvastatin DYSPNOEA 58 12 • vareniclinedarvocet ABNORMAL DREAMS, FATIGUE, INSOMNIA,MEMORY IMPAIRMENT, NAUSEA 52 2668 • Association rules known: 67% • Association rules unknown: 33%
Association Rule Mining for Adverse Drug Reactions Drug Combinations in Association Rules: • Drug-drug interactions found that are known: 4% • Drug-drug combinations known to be given together or treat same indication: 78% • Drug-drug combinations that seem to be due to confounding: 9% • Drug-drug interactions that are unknown: 9%
Drug Resistant Bacteria • Some bacteria develop drug resistance making infection control difficult • Brossette & Hymel [2008] and Brossette et al. [1998] did data mining for infection control • They studied Pseudomonas aeruginosa bacteria, notorious for drug resistance • Common cause of infections in humans • Transmission is caused by medical equipment
Association Rule Mining for Infection Control • Data collection includes records for single Pseudomonas aeruginosaisolates with attributes • date reported • source of isolate (sputum, blood) • location of patient in the hospital • patient’s home zip code • resistant (R), intermediate resistance (I), susceptible (S) for piperacillin, ticarcillin/clavulanate, ceftazidime, imipenem, amikacin, gentamicine, tobramycine, ciprofloxacin.
Association Rule Mining for Infection Control • System was designed to detect patterns of increasing resistance to antimicrobials • Data is partitioned in time slices to determine the change in resistance of the bacterium • Variation of confidences of a rule XY across time slices is computed • Substantial increase in confidence is deemed to constitute an event
Association Rule Mining for Infection Control • Following time slices were created: • 12 one-month fragments to find short-lived patterns • 4 three-month fragments to find medium duration patterns • 2 six-month fragments to find long-lived patterns • Apriori algorithm was applied with minimum support 2 for items and 10 for association rules
Association Rule Mining for Infection Control Sample association rules found: • Empty R-ticarcillin/clavulanate R-ceftazidime R-piperacillin • a jump from 4%(Oct) to 8%(Nov) to 11%(Dec)suggests that the isolate is resistant to ticarcillin/clavulanate, ceftazidime and piperacillin • R-ceftazidime R-piperacillinsputumR-ticarcillin/clavulanate • 8%(Feb)-32%(Aug) it is likely that the isolate is from sputum and is ticarcillinresistent given that is resistant to ceftazidime and piperacillin • R-piperacillinsputumR-ticarcillin/clavulanateR – ceftazidime • an increase from 6% (Q3) to 26% (Q4) in the probability that the isolate is from sputum, is ticarcillin/clavulanate and ceftazidime resistant given that is piperacillin resistant • R-ticarcillin/clavulanatesputumR-ceftazidimeR-piperacillin • an increase from 7% (Q3) to 24% (Q4) in the probability that isolate is from sputum, is ceftazidime and piperacillin resistant given that is ticarcilline/clavulanate resistant • R-ticarcillin/clavulanateR-ceftazidimeR-piperacillin sputum • an increase from 12% (Q3) to 42% (Q4) in the probability that the isolate is from sputum given that it is resistent to ticarcillin/clavulanate, ceftazidime, and piperacillin
Transitivity of Association Rules • For medical data mining, it is desirable for to have transitivity property for the association rules: XY and YZ should also imply XZ • For consistency • For analyzing the rules • Popular data mining methods, like Apriori algorithm, do not ensure transitivity of the association rules they mine
Transitivity of Association Rules • Existing methods have been suitably modified to ensure transitivity of association rules: • Investigate XZ if XY and YZ have a medical interpretation [Mukhopadhyay et al. 2004]; TransMiner software • Starting from X Z, seek canditates XY and YZ [Wright et al. 2010]
Data Mining in Medicine • Data mining cannot replace human factor in medical research but it can greatly aid • Interaction between data mining and medical research is beneficial for both the domains • Biology and medicine suggest novel problems for data mining and machine learning • Open problems: • Mining from unstructured data (natural language text: progress reports, outpatient notes etc.) • Evaluation of association rules