1 / 22

Discovering Patterns in Adverse Drug Reactions

Discovering Patterns in Adverse Drug Reactions. Student: Ernst Joham Supervisor: Associate Prof Jiuyong Li Associate Supervisor Dr. Jan Stanek. Outline. Background Motivation Research questions Literature review Data Mining process Results Conclusion. Background.

xylia
Download Presentation

Discovering Patterns in Adverse Drug Reactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Patterns in Adverse Drug Reactions Student: Ernst Joham Supervisor: Associate Prof Jiuyong Li Associate Supervisor Dr. Jan Stanek

  2. Outline • Background • Motivation • Research questions • Literature review • Data Mining process • Results • Conclusion

  3. Background • What is data mining? Data mining is used to discover unexpected, interesting and valuable information in datasets. • High percentage of patients admitted or prolonged hospitalisation is due to ADRS. • What can cause ADRS? • Amount of dosage given to patients • More then one drug taken at the same time • Ingredients in drugs which can result in adverse reaction.

  4. Background • Problems with medical datasets • Medical data is more diverse and complex • Ethical and legal issues • Data quality • Missing values • Noise • Ownership • Lack of information

  5. Motivation • To have a successful outcome in discovering patterns for medical datasets • Finding the most suitable algorithms to handle noise and missing values for medical datasets • Improve complexity and diversity of medical datasets

  6. Research Questions • The aim of the research was to use data mining methods in an attempt to produce relevant results from real world medical data. • The following research questions were answered (1) Is it possible to discover patterns in spares datasets? (2) What patterns can be identified through data mining for ADRs?

  7. Literature review (techniques) • Decision Tree, Logistic programs, K nearest neighbour and Bayesian classifier techniques have been applied to medical datasets (Laverac 1999). • Lee et al(2000) states that techniques that easily extract specific knowledge are the key for medical decision. • A study on drug discovery showed that neural networks performed better then logistic regression, but decision tree performed better in identifying active compounds (Obenshain 2004).

  8. Literature review (process model) • Medical data mining applications that is expected to discover new knowledge should follow a five stage process model (Wang 2000). • planning tasks • developing data mining hypotheses • preparing data • selecting data mining tools • evaluating data mining results. • Cios & Moore 2002 state that for success you need to follow the DMKD that adds several steps to the CRISP-DM model that has been applied to several medical problem domains.

  9. Literature review (problems with medical datasets) • Brown & Kros (2003) focused on the impact of missing data and how existing methods can help. They categories methods for dealing with missing data into: • Use complete data only • Delete selected case or variables • Data imputation • Model-based approaches • Some researchers have focused on data cleansing tools to help eliminate noise but this can only achieve a reasonable result (Zhu & Wu 2004).

  10. Literature review • (Zhu & Wu 2004). Attribute noise is more difficult to handle and include: • (1) Incorrect attribute values • (2) Missing or don’t know attribute values • (3) Incomplete attributes or don’t care values

  11. Data Mining Processing • The project used the data mining method of CRISP_DM six step data mining process • Understand the main aim of the project • Understand the dataset ADRDATE Agedays BRAND DRUG ID Prob ROUTE Recov Severity URNO ATC 31/01/2007 Lyclear Permethrin 707 Cert Topical Rec Minor unknown P03AC04 9/06/2003 14367 Tegretol CR Carbamazepine 4 Cert Oral Rec ax6cx8z N03AF01 11/06/2003 1 4173 Zoloft Sertraline 5 Unc Oral ax66486 N06AB06

  12. Data mining Process Summary of missing values Total 1286 records

  13. Data Mining Process • Data .csv format • R programming language • Rattle tool for data mining • Data preparation • Remove duplicates • Correct misspelled words • Correct meanings of values • Find missing ATC values (Anatomical Therapeutic Chemical) • Leave missing values for rest of dataset

  14. Data mining Process • Data transformation • Date when the patient was admitted to hospital for ADRs (October-March =1, April-September = 0) • How old the patient is categorised into equal number of records.(0-2 years old = 1, 2-5 years old = 2, 5-11 years old = 3, 11-16 years old = 4, and above 16 years of age = 5) • The administration of the medication that caused the ADR is either oral or intravenous.(Oral = 1, Intravenous = 0) • Recovered from ADRs or not.(Recovered = 0, Not recovered = 1) • The drugs given to the patient either are antibiotics or not.(Antibiotics =1, Not Antibiotics =0)

  15. AGE ROUTE ROUTE Data Mining Processing ADRDATE AGE RECOV ATC ROUTE

  16. Data Mining Process • Modelling phase • Logistic regression, • Decision tree, • Risk pattern algorithm • Evaluation Phase • Deployment

  17. Results • Results for the logistic regression technique Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.901353 0.466304 -4.077 4.55e-05 *** ADRDATE 0.136312 0.285722 0.477 0.633 AGEDAYS 0.002067 0.115482 0.018 0.986 ROUTE 0.059532 0.290016 0.205 0.837 ANTIBIOTICS -0.181255 0.300150 -0.604 0.546

  18. Results • Decision Tree Result 1) root 1035 473 1 (0.4570048 0.5429952) 2) AGE>=3.5 407 140 0 (0.6560197 0.3439803) 4) ADRDATE< 0.5 203 61 0 (0.6995074 0.3004926) * 5) ADRDATE>=0.5 204 79 0 (0.6127451 0.3872549) 10) AGE>=4.5 100 35 0 (0.6500000 0.3500000) 20) ROUTE>=0.5 79 27 0 (0.6582278 0.3417722) * 21) ROUTE< 0.5 21 8 0 (0.6190476 0.3809524) 42) RECOV=Yes 18 6 0 (0.6666667 0.3333333) * 43) RECOV=NO 3 1 1 (0.3333333 0.6666667) *

  19. Results • Decision Tree Result 11) AGE< 4.5 104 44 0 (0.5769231 0.4230769) 22) ROUTE< 0.5 77 30 0 (0.6103896 0.3896104) * 23) ROUTE>=0.5 27 13 1 (0.4814815 0.5185185) * 3) AGE< 3.5 628 206 1 (0.3280255 0.6719745) 6) ROUTE< 0.5 236 109 1 (0.4618644 0.5381356) 12) RECOV=NO 24 6 0 (0.7500000 0.2500000)

  20. Results • Risk patterns for NO • 33.03242.4852269 7ADRDATE1A GEDAYS3ANTIBIOTICS0 • 23.10022.5582624616AGEDAYS3ANTIBIOTICS0 332.56632.19042596ADRDATE 1AGEDAYS4 ROUTE1 432.53752.175734268AGEDAYS4 ROUTE1 ANTIBIOTICS0 • Pattern 1 where Risk Ratio = 2.48 • Agedays = between 5-11 years old • Adrdate = months between October – March • Antibiotics = No

  21. Conclusion • Building a data mining process to answer the problem posed. • Use algorithms that work for medical applications • Noise and missing values does pose a problem but reasonable results can still be achieved. • More relevant patterns can be produced for medical experts if maximum information is included in the dataset.

  22. Reference • Brown, ML & Kros, JF 2003, 'Data mining and the impact of missing data', Industrial Management & Data Systems, vol. 103, pp. 611-621.  • Cios, K 2002, 'Uniqueness of medical data mining', Artificial intelligence in medicine, vol. 26, no. 1-2, pp. 1-24. • CRISP_DM 2000, Cross Industry Standard Process for Data Mining, viewed 27 August 2008, <http://www.crisp-dm.org/Partners/index.htm>. • Li, J, Fe, AW-c, He, H, Chen, J, Jin, H, McAullay, D, Williams, G, Sparks, R & Kelman, C 2005, Mining risk patterns in medical data, ACM, Chicago, Illinois, USA. • Lavrač, N 1999, 'Selected techniques for data mining in medicine', Artificial intelligence in medicine, vol. 16, no. 1, pp. 3-23. • Lee, I-N, Liao, S-C & Embrechts, M 2000, 'Data mining techniques applied to medical information', Medical Informatics & the Internet in Medicine,vol. 25, no. 2, pp. 81-102. • Obenshain, MK 2004, ‘Application of Data Mining Techniques to Healthcare Data’, Infection Control and Hospital Epidemiology, vol.25, no 8, pp. 690-695. • Safety of Medicines 2002, A Guide to Detecting and Reporting Adverse DrugReaction Why Health Professionals Need to Take Action, WHO publications, viewed15 April 2008, http://whqlibdoc.who.int/hq/2002/WHO_EDM_QSM_2002.2.pdf>. • Wang, H & Wang, S 2008, 'Medical knowledge acquisition through data mining', paper presented at the IT in Medicine and Education, 2008. ITME 2008. IEEE International Symposium on, Xiamen • Zhu, X, Khoshgoftaar, T, Davidson, I & Zhang, S 2007, 'Editorial: Special issue on mining low-quality data', Knowledge and Information Systems, vol. 11, no. 2, pp. 131-136.

More Related