210 likes | 346 Views
University of Economics, Prague. MLNET related activities of Laboratory for Intelligent Systems and Dept. of Information and Knowledge Engineering http://lisp.vse.cz/~berka/MLNet.html. Research. probabilistic methods - decomposable probability models and bayesian networks
E N D
University of Economics, Prague MLNET related activities of Laboratory for Intelligent Systems and Dept. of Information and Knowledge Engineering http://lisp.vse.cz/~berka/MLNet.html
Research • probabilistic methods - decomposable probability models and bayesian networks • symbolic methods - generalized association rules and decision rules • logical calculi for knowledge discovery in databases (c) Petr Berka, LISp, 2000
People Petr Berka Jiří Ivánek Radim Jiroušek Jan Rauch Vojtěch Svátek Tomáš Kočka (c) Petr Berka, LISp, 2000
Software LISp-Miner • two data mining procedures: 4FT Miner (generalised association rules) and KEX (decision rules), • large preprocessing module including SQL, • output of rules in database format enables the users to implement own interpretation procedures. (c) Petr Berka, LISp, 2000
LISP-Miner procedures • 4FT-Miner (GUHA procedure) generalised association rules in the form Ant ~ Suc / Cond • KEX weighted decision rules in the form Ant ==> C (weight) (c) Petr Berka, LISp, 2000
4FT-Miner Data Matrix: CLIENTS LOANS Id Age Sex Salary District Amount Payment Months Quality 1 45 F 28 000 Prague 48 000 1 000 48 good ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 70 000 18 M 12 000 Brno 36 000 2 000 18 bad Problem:Are there segments of clients SC and segments of loans SL such that To be in SC is at 90% equivalent to have a loan from SL and there is at least 100 such clients Ant is at 90% equivalent to Suc Ant 0.90%, 100 Sucis true iff a/(a+b+c) 0.9 a 100 Suc Suc a - number of objects satisfying Ant and Suc Ant a b b- number of objects satisfying Ant and not satisfying Suc Ant c d c- number of objects not satisfying Ant and satisfying Suc d- number of objects satisfying neither Ant nor Suc (c) Petr Berka, LISp, 2000
4FT Miner • Input: • Data matrix, • quantifier 0.90%, 100 • Derived attributes for SC (possible Ant): Age (7 values), Sex (2 values), Salary (3 values), District (77 values) • Derived attributes for SL (possible Suc): Amount (6 values), Duration (5 values), Quality (2 values) • Output: • All Ant 0.90%, 100 Suc true in data matrix • (5 equivalences from about 5 milions possible relations) • an example: • Age(20 - 30) Sex(F) Salary(low) District (Prague) 0.90%, 100 Amount<20,50) Quality(Bad) • Suc Suc • a/(a+b+c) = 0.95 0.9 Ant 950 30 • 950 100 Ant 20 69000 (c) Petr Berka, LISp, 2000
KEX - classification (c) Petr Berka, LISp, 2000
KEX - learning (c) Petr Berka, LISp, 2000
LISp-Miner (c) Petr Berka, LISp, 2000
LISp-Miner (c) Petr Berka, LISp, 2000
LISp-Miner (c) Petr Berka, LISp, 2000
LISp-Miner (c) Petr Berka, LISp, 2000
4FT Miner and KEX Applications • truck reliability assessment • quality control in a brewery • segmentation of clients of a bank • short-term electric load prediction (c) Petr Berka, LISp, 2000
LISp Miner References: • Berka,P. - Ivanek,J.: Automated Knowledge Acquisition for PROSPECTOR-like Expert Systems. In: (Bergadano, deRaedt eds.) Proc. ECML'94, Springer 1994, 339-342. • Berka,P. - Rauch,J.: Data Mining using GUHA and KEX. In: (Callaos, Yang, Aguilar eds.) 4th. Int. Conf. on Information Systems, Analysis and Synthesis ISAS'98, 1998, Vol 2, 238- 244. • Rauch,J.: Classes of Four Fold Table Quantifiers. In: (Zytkow, Quafafou eds.) Principles of Data Mining and Knowledge Discovery. Springer 1998, 203 - 211. (c) Petr Berka, LISp, 2000
Datasets PKDD‘99 Discovery Challenge data (http://lisp.vse.cz/pkdd99/chall.htm) • financial data: clients of a bank, their accounts, transactions, loans etc, • medical data: patients with collagen disease (c) Petr Berka, LISp, 2000
Financial data (c) Petr Berka, LISp, 2000
Medical data (c) Petr Berka, LISp, 2000
Organized conferences Teaching (in czech) KDD KDD seminar ML Other activities http://lisp.vse.cz/ecml97/ http://lisp.vse.cz/pkdd99/ (c) Petr Berka, LISp, 2000
New projects SOL-EU-NET project „Data Mining and Decision Support for Business Competitiveness: A European Virtual Enterprise“ (supported by EU grant IST-1999-11.495) (c) Petr Berka, LISp, 2000