400 likes | 435 Views
Explore the core concepts of data mining, including induction-based learning, KDD process, supervised and unsupervised learning, and expert systems. Discover the importance of data selection, cleaning, and integration in the knowledge discovery process. Learn about decision trees, clustering, and data mining applications in customer value assessment.
E N D
Part I Data Mining Fundamentals
Data Mining: A First View Chapter 1
Data Mining The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.
Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned.
Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process.
Data Mining: A KDD Process Knowledge Pattern Evaluation • Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases
Four Levels of Learning Facts Concepts Procedures (to be worked out) Principles
Concepts Computers are good at learning concepts. Concepts are the output of a data mining session.
Three Concept Views Classical View (Crisp)---old hands As a definition Probabilistic View (85%)---with some experience DM rules with confidence Exemplar View (CBR)—new comer An illustrated example: good credit?
Supervised Learning Build a learner model using data instances of known origin. Use the model to determine the outcome new instances of unknown origin.
Supervised Learning: A Decision Tree Example
Decision Tree A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes.
Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy
Unsupervised Clustering A data mining method that builds models from data without predefined classes.
3 groups formed (table 1.3 is only a part of whole table) G1.MarginAccount=yes and age =20-29 and AnnualIncome=40-59k accuracy=80% coverage=0.5 G2. AccountType=Custodial and FavoriteRecreation=Skiing and AnnualIncome=40-59k accuracy=95% coverage=0.35 G3.AccountType=joint and Trades/Month>5 and TransactionMethod=online accuracy=82% coverage=0.65
Data Mining or Data Query? Shallow Knowledge (SQL) Multidimensional Knowledge (OLAP) Hidden Knowledge (DM) Deep Knowledge (human)
Data Mining vs. Data Query: An Example Use data query if you already almost know what you are looking for. Use data mining to find regularities in data that are not obvious.
Expert System A computer program that emulates the problem-solving skills of one or more human experts.
Knowledge Engineer A person trained to interact with an expert in order to capture their knowledge.
Assembling the Data The Data Warehouse Relational Databases and Flat Files
1.6 Why Not Simple Search? Nearest Neighbor Classifier (i.e., CBA, add a new instance in a class based on similarity) Time consuming and entropy independent K-nearest Neighbor Classifier Form a class consisting of K-nearest neighbors
Assignment 4 A new instance, Patient ID=14, Sore Throat=yes, Fever =No, Swollen Glands=No, Congestion =No, Headache =No Comparison: with one matched attribute: ID=1,9 with one matched attribute: ID=2,5,10 with one matched attribute: ID=3,6,7,8 with one matched attribute: ID=4strep throat? Correct diagnosis should be allergy using decision tree Q: Try K-nearest Neighbor Classifier