1 / 31

MUDIM (Petr Šimeček , Euromise )

MUDIM (Petr Šimeček , Euromise ). system for multidimensional compositional models (Radim Jirou šek) C++ code, distributed as R-package focused on medical applications. Contents:. idea of conditional independence and (de)composition possible applications of MUDIM expert system data mining

kalil
Download Presentation

MUDIM (Petr Šimeček , Euromise )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MUDIM(Petr Šimeček, Euromise) system for multidimensional compositional models(Radim Jiroušek) C++ code, distributed as R-package focused on medical applications

  2. Contents: • idea of conditional independence and (de)composition • possible applications of MUDIM • expert system • data mining • STULONG dataset

  3. CI - Theory of Storks BIRTH RATE STORKPOPULATION

  4. CI - Theory of Storks Do storks deliver newborns? Statisticallyconnected BIRTH RATE STORKPOPULATION

  5. CI - Theory of Storks ENVIRONMENT No! BIRTH RATE STORKPOPULATION

  6. ENVIRONMENT connected connected BIRTH RATE STORKPOPULATION CI - Theory of Storks

  7. CI – Weather WEATHERTODAY WEATHERYESTERDAY WEATHERTOMORROW

  8. CI – Weather WEATHERTODAY WEATHERYESTERDAY WEATHERTOMORROW

  9. CI – Sample Medical Data = variable (attribute); f.e. AGE, BLOOD PREASURE, …

  10. CI – Sample Medical Data = variable (attribute); f.e. AGE, BLOOD PREASURE, … = (unconditional) statistical connection(correlation) betweenthepair of variables

  11. CI – Storks & Weather ENVIRONMENT BIRTH RATE STORKPOPULATION TODAY YESTERDAY TOMORROW

  12. CI – Storks & Weather ENVIRONMENT BIRTH RATE STORKPOPULATION TODAY YESTERDAY TOMORROW

  13. = causality betweenthe pair of variables CI – Sample Medical Data = variable (attribute); f.e. AGE, BLOOD PREASURE, …

  14. Locality - illustration Variable X Directly explanatoryvariables for X Other variables If we know information about directly explanatoryvariables for X, then knowledge about other explanatory variables is useless for predicting X.

  15. Applications – Expert Systems Causality

  16. Applications – Expert Systems Causality

  17. Applications – Expert Systems Causality

  18. Applications – Expert Systems Causality

  19. Applications – Expert Systems Causality

  20. Idea of Compositional Models

  21. Applications – Expert Systems What is the distribution of if we know ? Causality

  22. Data Mining We don’t know “anything”, there are lots of variables and lots of possiblerelations between them. We need to formulate possible hypothesis, suggest some promising models, etc. (useful in pre-research).

  23. Data Mining Variables Data

  24. Direction of Causality Problem is equivalent to are equivalent, but they are notequivalent to

  25. STULONG Dataset = Dataset containing research data on cardiovascular disease(1976-79) • 1417 patients (Czech middle-aged men) • 244 attributes surveyed with each patient at the entry examination • 37 selected attributes are described here

  26. (Incomplete) List of Attributes • MYOCARDIALINFARCTION • HYPERTENSION • ICTUS • HYPERLIPIDEMIA • CHEST PAIN • ASTHMA • HEIGHT & WEIGHT • BLOOD PREASURE • … • AGE • MARITAL STATUS • EDUCATION • OCCUPATION • PHISICAL ACTIVITY • TRANSPORT TO JOB • SMOKING • ALCOHOL • TEA AND COFFEE

  27. Graph of Correlated Pairs 464 of 666possible pairs are statistically connected (p=0.05)

  28. Graph of Correlated Pairs 2 160 of 666possible pairs are statistically connected (p=0.05/666)

  29. 56arrows

  30. Risk Factors for Hypertension >summary(glm(HT~HYPLIP+IM+AGE+SUBSC,data=C,family="binomial")) Coefficients: Estimate Std. Error z value Pr(>|z|) Estimate Std. Error z value Pr(>|z|) (Intercept) -4.322730 1.274252 -3.392 0.000693 *** IM 1.246937 0.513342 2.429 0.015138 * HYPLIP 1.126383 0.333971 3.373 0.000744 *** SUBSC 0.009521 0.003978 2.393 0.016699 * AGE 0.245182 0.136678 1.794 0.072835 . --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

  31. Risk Factors for Hypertension Interpretation: • HYPERLIPIDEMIA and IM triple odds of ratio • Each three years of AGE double odds of ratio • There is also small, but evincible connection to skinfold above musculus subscapularis (SUBSC)

More Related