130 likes | 272 Views
Machine Learning Group. group leader: Prof. Olga Štěpánková members: Dr. Jiří Kléma, Dr. Filip Železný, Lenka Nováková, Michal Jakob, Pavel Novák http://gerstner.felk.cvut.cz/machine-learning/. The Gerstner laboratory for intelligent decision making and control, Czech Technical University.
E N D
Machine Learning Group group leader: Prof. Olga Štěpánková members: Dr. Jiří Kléma, Dr. Filip Železný, Lenka Nováková, Michal Jakob, Pavel Novák http://gerstner.felk.cvut.cz/machine-learning/ The Gerstner laboratory for intelligent decision making and control, Czech Technical University Workshop on Intelligent and Adaptive Systems in Medicine, Mar 31-Apr 1, 2003
Introduction • Research: centers around Machine Learning and its applications in Data Mining, we teach principles of both fields in several courses. • Theory: basic ML principles, such as Instance-Based Learning, Inductive Logic Programming and various probabilistic/randomization techniques. • Applications: several real-life projects, namely in the medical domain (heart-surgery mortality prediction), in industry (intelligent fault-diagnosis), and telecommunications (tracing patterns in callers' behaviour). • Development: ML/DM systems for both practical and experimental purposes.
Research Streams • Probabilistic Reasoning in Relational Learning - learning hypotheses in first-order logic (field known as Inductive Logic Programming) and Bayesian Inference • Data Preprocessing for Machine Learning and Data Mining - adapting a data preprocessing tool for Inductive Logic Programming and other ML tools • Learning in Multi-Agent Systems - ability to improve the future performance of the total MA system, a part of it, or a single agent • Instance-Based Learning- automated optimization of IBL predictive and classification systems
Projects • Data Mining and Decision Support for Business Competitiveness: A European Virtual Enterprise, Sol-Eu-Net, (IST - 1999 - 11495), January 2000- March 2003, http://soleunet.ijs.si, (O.Štěpánková) • Data-Mining and Decision Support Integration, CTU 0209013, 2002, (J. Kléma) • KDnet (team member: CTU) • European Network on Intelligent Technologies for Smart Adaptive Systems, Eunite (IST-2000-29207), 2001-2003, (team member: CTU) • Inductive Logic Programming Network of Excellence, ILPNet2 (INCO Network of Excellence 977 102) (team member: CTU)
Selected Publications (1) • O. Štěpánková, J. Kléma, P. Mikšovský: Applying Sumatra TT and RAMSYS: Prediction of Resources for a Health Farm. To appear in Data Mining and Decision Support: Integration and Collaboration. To be published by Kluwer in 2003. • O. Štěpánková, P. Aubrecht, Z. Kouba, P. Mikšovský: Preprocessing for Data Mining and Decision Support. To appear in Data Mining and Decision Support: Integration and Collaboration. To be published by Kluwer in 2003. • J. Kléma, F. Železný, O. Štěpánková: Strojové učení a dobývání znalostí z dat, chapter in Artificial Intelligence (4) book, In Czech. To be published by Academia Publishers in 2003. • F. Železný: Two probabilistic approaches to first-order theory induction (PhD. Thesis, 2003). • J. Kléma: Prototype Applications of Instance-Based Reasoning (PhD. Thesis, 2002).
Selected Publications (2) • J. Kléma, J. Kubalík, J. Palouš: Optimized Model Tuning in Medical Systems. In: Proceedings - Computer-Based Medical Systems. New York : IEEE Computer Society Press, 2002, vol. 1. • F. Železný, O. Štěpánková: Efektivní převod multirelační database na jednorelační reprezentaci. In Proceedings Znalosti 2003, Ostrava : VŠB-TUO, 2003, vol. 1. • Štěpánková, O. - Klema, J. - Lauryn, Š. - Mikšovský, P. - Nováková, L. Data Mining for Resource Allocation:A Case Study. In: Knowledge and Technology Integration in Production and Services. New York : Kluwer Academic / Plenum Publishers, 2002. • Železný, F. Learning Functions from Imperfect Positive Data. In: Inductive Logic Programming. Berlin : Springer, 2001, vol. 1, p. 248-259. ISBN 3-540-42538-1.
Research Partners • Rockwell Automation, USA - pump fault diagnostics • Grundfos, Denmark - intelligent pump diagnostics • TeleDataElectronics, Germany - prediction of gas consumption • CertiCon, CZ - OPS and SPS predictive tools, medical diagnostics • IKEM Prague, CZ - heart-surgery mortality prediction • Atlantis Telecom, CZ - data mining in telephony • University of Maribor, System Design Laboratory, Slovenia - decision support in medical systems
Developed Systems • iBARET (Instance-BAsed REasoning Tool) - a universal tool for modelling and predicting in domains described by a vector of numeric or symbolic values • PreDO (PREcisely Defined Objects) - a system that generates experimental data for training and testing of ML algorithms • CIDeT (Clustering and Induction of DEcision Trees) - a system for unsupervised learning • RSD - First-Order Feature Construction and Relational Subgroup Discovery
ML and KDD applications (1) • Resource allocation at a spa • Input: relation data (patients ~ 20.000, procedures ~ 40, procedure prescriptions ~ 1.500.000, forbidden procedure combinations ~ x10) • Goals: • project start: exploratory analysis, find interesting patterns or regularities that can help to improve resource allocation and control of the facilities • after analysis: try to predict in advance the overall number of prescriptions of the specific health procedures within a specific time period, identify previously unknown groups of clients exhibiting characteristic behavior or requirements for procedures. • Algorithms and tools: • preprocessing demanding task -> SumatraTT • „regression per partes“ used for prediction • collaborative task solved by several remote teams
ML and KDD applications (1) • Results: • accurate and timely prediction (88% accuracy based on vague client description) • understandable knowledge gained during prediction (groups of clients) • Practical utilization: • aplication/modification for other similar facilities • incorporated into IS developed by Lauryn v.o.s.
ML and KDD applications (2) • Data mining in telecommunications • Task • Analyze the logging file of an enterprise branch telephone exchange • Create descriptions of recognized events • Discover frequent patterns in events • Visualize data • Solution • Learn event descriptions from generated event examples • Decompose structured logging data into multiple relations • Apply descriptive and predictive multi-relational machine learning algorithms, such as Inductive Logic Programming, as well as visualization techniques
Telecomm.Traffic LoggingData Telephone Exchange Rules EventReconstruction EventDescriptions Prediction Interconnection of parts of the DM/DS system. Machine learning algorithms are applied in the red boxes. ML and KDD applications (2)
ML and KDD applications (2) • Results • Most of events successfully recognized • Insight into telecommunication habits in the enterprise • Rules with predictive nature plugged-in for decision support of the exchange operator