Competitive advantage from Data Mining: some lessons learnt in the Information Systems field

PMKD'05 Copenhagen, Denmark August 22-26, 2005. Competitive advantage from Data Mining: some lessons learnt in the Information Systems field. Mykola Pechenizkiy , Seppo Puuronen Department of Computer Science University of Jyväskylä Finland Alexey Tsymbal

Competitive advantage from Data Mining: some lessons learnt in the Information Systems field

  1. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from Data Mining: some lessons learnt in the Information Systems field Mykola Pechenizkiy, Seppo Puuronen Department of Computer ScienceUniversity of Jyväskylä Finland Alexey Tsymbal Department of Computer ScienceTrinity College DublinIreland

  What is Data Mining Data mining or Knowledge discoveryis the process of finding previously unknown and potentially interesting patterns and relations in large databases (Fayyad, KDD'96) Data mining is the emerging science and industry of applying modern statistical and computational technologies to the problem of finding useful patterns hidden within large databases (John 1997) Intersection of many fields: statistics, AI, machine learning, databases, neural networks, pattern recognition, econometrics, etc.

  H.Information Systems • H.0 GENERAL • H.1 MODELS AND PRINCIPLES • H.2 DATABASE MANAGEMENT • H.2.0 General • Security, integrity, and protection • H.2.8 Database Applications • Data mining • Image databases • Scientific databases • Spatial databases and GIS • Statistical databases • H.2.m Miscellaneous http://www.acm.org/class/1998/ valid in 2003

  4. H.Information Systems • H.0 GENERAL • H.1 MODELS AND PRINCIPLES • H.2 DATABASE MANAGEMENT • H.2.0 General • Security, integrity, and protection • H.2.8 Database Applications • Data mining • Image databases • Scientific databases • Spatial databases and GIS • Statistical databases • H.2.m Miscellaneous http://www.acm.org/class/1998/ valid in 2003 PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Pechenizkiy, S. Puuronen, A. Tsymbal

  G. Mathematics of Computing • G.3 PROBABILITY AND STATISTICS • Correlation and regression analysis • Distribution functions • Experimental design • Markov processes • Multivariate statistics • Nonparametric statistics • Probabilistic algorithms (including Monte Carlo) • Statistical computing

  Our Message • DM is still a technology having great expectations to enable organizations to take more benefit of their huge databases. • There exist some success stories where organizations have managed to have competitive advantage of DM. • Still the strong focus of most DM-researchers in technology-oriented topics does not support expanding the scope in less rigorous but practically very relevant sub-areas. • Research in the IS discipline has strong traditions to take into account human and organizational aspects of systems beside the technical ones.

  Our Message • Currently the maturation of DM-supporting processes which would take into account human and organizational aspects is still living its childhood. • DM community might benefit, at least from the practical point of view, looking at some other older sub-areas of IT having traditions to consider solution-driven concepts with a focus also on human and organizational aspects. • The DM community by becoming more amenable to research results of the IS community might be able to increase its collective understanding of • how DM artifacts are developed – conceived, constructed, and implemented, • how DM artifacts are used, supported and evolved, • how DM artifacts impact and are impacted by the contexts in which they are embedded.

  Part I • Existing Frameworks for DM • Theory-oriented • Databases; • Statistics; • Machine learning; • Data compression • Process-oriented • Fayyad's • CRISP-DM • Reinartz's

  Theory-Oriented Frameworks

  10. Theory-Oriented Frameworks

  Reductionism Approach • Two basic Statistical Paradigms • "Statistical Experiment" • Fisher's version, inductive principle of maximum likelihood • Neyman and Pearson-Wald's version, inductive behaviour • Bayesian version, maximum posterior probability • "Statistical learning from empirical process" • "Structural Data Analysis" • SVD • Data mining  statistics - the issue of computational feasibility has a much clearer role in data mining than in statistics • data mining area approaches that emphasize on database integration, simplicity of use, and the understandability of results • theoretical framework of statistics does not concern much about data analysis as a process that includes several steps

  Machine Learning Approach • "let the data suggest a model" can be seen as a practical alternative to the statistical paradigm "fit a model to the data" • Constructive Induction – a learning process, two intertwined phases: construction of the "best" representation space and generating hypothesis in the found space (Michalski & Wnek, 1993). • Feature transformation (PCA, SVD, Random Projection) • Feature selection • LSI

  Data Compression Approach • Compress the data set by finding some structure or knowledge for it, where knowledge is interpreted as a representation that allows coding the data by using fewer amount of bits. • Theories should not be ad hoc that is they should not overfit the examples used to build it. • Occam's razor principle,14th century. • "when you have two competing models which make exactly the same predictions, the one that is simpler is the better". Mehta, M., Rissanen, J., and Agrawal, R. 1995, MDL-based decision tree pruning. In U.M. Fayyad, R. Uthurusamy (Eds.) Proceedings of the KDD 1995, AAAI Press, Montreal, Canada, 216-221.

  Other Theoretical frameworks for DM • Microeconomic view • the key point is that data mining is about finding actionable patterns: the only interest is in patterns that can somehow be used to increase utility; • a decision theoretic formulation of this principle: the goal can be formulated in finding a decision x that tries to maximise utility function f(x). Kleinberg, J., Papadimitriou, C., and Raghavan, P. 1998, A microeconomic view of data mining, Data Mining and Knowledge Discovery2(4), 311-324 • Philosophy of Science • logical empiricism, critical rationalism, systems theory • formism, mechanism, contextualism • dispersive vs. integrative, analytical vs. synthetic theories • subjectivist vs. objectivist, nomothetic vs. ideographic, nominalism vs. realism, voluntarism vs. determinism, epistemological assumptions • Explanation, prediction, understanding

  Process-Oriented Frameworks

  16. Process-Oriented Frameworks

  CRISP-DM http://www.crisp-dm.org/

  KDD: "Vertical Solutions" Reinartz, T. 1999, Focusing Solutions for Data Mining. LNAI 1623, Berlin Heidelberg.

  19. KDD: “Vertical Solutions” Reinartz, T. 1999, Focusing Solutions for Data Mining. LNAI 1623, Berlin Heidelberg. PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Pechenizkiy, S. Puuronen, A. Tsymbal

  Part II Where we are? Rigor and Relevance in DM Reseach

  21. Part II Where we are? Rigor and Relevance in DM Reseach PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Pechenizkiy, S. Puuronen, A. Tsymbal

  Rigor vs Relevance in DM Research

  23. Rigor vs Relevance in DM Research PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Pechenizkiy, S. Puuronen, A. Tsymbal

  Part III Towards the new framework for DM research

  25. Part III Towards the new framework for DM research PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Pechenizkiy, S. Puuronen, A. Tsymbal

  The ISs-based paradigm for DM Ives B., Hamilton S., Davis G. (1980). "A Framework for Research in Computer-based MIS" Management Science, 26(9), 910-934. "Information systemsare powerful instruments for organizational problem solving through formal information processing" Lyytinen, K., 1987, "Different perspectives on ISs: problems and solutions." ACM Computing Surveys, 19(1), 5-46.

  Theory Building DM Artifact Development Observation Experimentation DM Artifact Development A multimethodological approach to the construction of an artefact for DM Adapted from: Nunamaker, W., Chen, M., and Purdin, T. 1990-91, Systems development in information systems research, Journal of Management Information Systems, 7(3), 89-106.

  Research methods in a paper on DM • Theoretical approach: theory creating • Hypothesis, new algorithm, etc. • Constructive approach • Prototype of a DM tool • Theoretical approach: theory testing and evaluation • Artificial, benchmark, real-world data • Evaluation techniques • Conclusion on theory

  Awareness of business problem Contextual Knowledge Action planning Artifact Development Business Knowledge Design Knowledge Artifact Evaluation Action taking Conclusion The Action Research and Design Science Approach to Artifact Creation

  System Quality Use Individual Impact Information Quality User Satisfaction Organizational Impact Service Quality DM Artifact Use: Success Model 1 of 3 Adapted from D&M IS Success Models

  31. System Quality Use Individual Impact Information Quality User Satisfaction Organizational Impact Service Quality DM Artifact Use: Success Model 1 of 3 Adapted from D&M IS Success Models PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Pechenizkiy, S. Puuronen, A. Tsymbal

  DM Artifact Use: Success Model 3 of 3 • Hermiz communicated his beliefs that there are the four critical success factors for DM projects: • (1) having a clearly articulated business problem that needs to be solved and for which DM is a proper tool; • (2) insuring that the problem being pursued is supported by the right type of data of sufficient quality and in sufficient quantity for DM; • (3) recognizing that DM is a process with many components and dependencies – the entire project cannot be "managed" in the traditional sense of the business word; • (4) planning to learn from the DM process regardless of the outcome, and clearly understanding, that there is no guarantee that any given DM project will be successful.

  KM Perspective • A knowledge-driven approach to enhance the dynamic integration of DM strategies in knowledge discovery systems. • Focus here is on knowledge management aimed to organise a systematic process of (meta-)knowledge capture and refinement over time. • knowledge extracted from data • the higher-level knowledge required for managing DM techniques' selection, combination and application • Basic knowledge management processes of • knowledge creation and identification, representation, collection and organization, sharing, adaptation, and application • DEXA'05: TAKMA WS paper&presentation are available

  New Research Framework for DM Research

  35. New Research Framework for DM Research PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Pechenizkiy, S. Puuronen, A. Tsymbal

  36. Further Work • Definition of Relevance concept in DM research • The revision of the book chapter • Further work on the new framework for DM research • Organization of Workshop or Special Track or Working conference on • more social directions in DM research likely with one of the focuses on IS as a sister discipline. Few options: • IRIS Scandinavian Conference on IS is one option • Next PMKD • Workshop in Jyväskylä PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Pechenizkiy, S. Puuronen, A. Tsymbal

  37. Thank You! Feedback is very welcome: • Questions • Suggestions • Collaboration Book chapter draft is available on request from Mykola Pechenizkiy Department of Computer Science and Information Systems, University of Jyväskylä, FINLAND E-mail: mpechen@cs.jyu.fi Tel.: +358 14 2602472 Fax: +358 14 260 3011 http://www.cs.jyu.fi/~mpechen PMKD’05 Copenhagen, Denmark August 22-26, 2005 Competitive advantage from DM: lessons learnt in the IS field by M. Pechenizkiy, S. Puuronen, A. Tsymbal

