100 likes | 138 Views
Datamining in e-Business: Veni, Vidi, Vici!. Prof. Dr. Veljko Milutinovic: Coarchitect of the World's first 200MHz RISC microprocessor, for DARPA, about a decade before Intel.
E N D
Datamining in e-Business: Veni, Vidi, Vici! • Prof. Dr. Veljko Milutinovic: • Coarchitect of the World's first 200MHz RISC microprocessor,for DARPA, about a decade before Intel. • Responsible for several successful datamining-oriented e-business on the Internet products, developed in cooperation with leading industry in the USA and Europe. • Consulted for a number of high-tech companies(TechnologyConnect, BioPop, IBM, AT&T, NCR, RCA, Honeywell, Fairchild, etc...) • Ph.D. from Belgrade. After that, for about a decade, on various positions (professor)at one of the top 5 (out of about 2000) US universities in computer engineering (Purdue). • Author and coauthor of about 50 IEEE journal papers (plus many more in other journals).According to some, a European record for his research field. • Guest editor for a number of special issues of: Proceedings of the IEEE, IEEE Transactions on Computers, IEEE Concurrency, IEEE Computer, etc… • Over 20 books published by the leading USA publishers(Wiley, Prentice-Hall, North-Holland, Kluwer, IEEE CS Press, etc...). • Forewords for 7 of his books written by 7 different Nobel Laureates, in cooperation with Telecom Italia learning services (you are welcome to visit http://www.ssgrr.it). vm@etf.bg.ac.yu http://galeb.etf.bg.ac.yu/~vm
THIS IS A DEMO VERSION OF THE TUTORIAL IN DATAMINING FOR E-BUSINESS ONLY A FEW SLIDES OF THE ORIGINAL TUTORIAL ARE PRESENTED HERE
Focus of this Presentation • Data Mining problem types • Data Mining models and algorithms • Efficient Data Mining • Available software
Decision Trees Balance>10 Balance<=10 Age<=32 Age>32 Married=NO Married=YES
Rule Induction • Method of deriving a set of rules to classify cases • Creates independent rules that are unlikely to form a tree • Rules may not cover all possible situations • Rules may sometimes conflict in a prediction
Comparison of fourteen DM tools • Evaluated by four undergraduates inexperienced at data mining, a relatively experienced graduate student, and a professional data mining consultant • Run under the MS Windows 95, MS Windows NT, Macintosh System 7.5 • Use one of the four technologies: Decision Trees, Rule Inductions, Neural, or Polynomial Networks • Solve two binary classification problems: multi-class classification and noiseless estimation problem • Price from 75$ to 25.000$
Comparison of fourteen DM tools • The Decision Tree products were - CART - Scenario - See5 - S-Plus • The Rule Induction tools were - WizWhy - DataMind - DMSK • Neural Networks were built from three programs - NeuroShell2 - PcOLPARS - PRW • The Polynomial Network tools were - ModelQuest Expert - Gnosis - a module of NeuroShell2 - KnowledgeMiner
Criteria for evaluating DM tools A list of 20 criteria for evaluating DM tools, put into 4 categories: • Capability measures what a desktop tool can do, and how well it does it - Handlesmissing data - - Considers misclassification costs - Allows data transformations - Includes quality of tesing options - Has a programming language - Provides useful output reports - Provides visualisation
Criteria for evaluating DM tools • Interoperability shows a tool’s ability to interface with other computer applications - Importing data - Exporting data - Links to other applications • Flexibility - Model adjustment flexibility - Customizable work enviroment - Ability to write or change code