220 likes | 495 Views
Introduction to Data Mining. Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu. Introduction to Data Mining. Definition General Concept Foundations Evolution Applications Challenges Algorithms Classical Next Generations. Introduction to Data Mining.
E N D
Introduction to Data Mining • Group Members: • Karim C. El-Khazen • Pascal Suria • Lin Gui • Philsou Lee • Xiaoting Niu
Introduction to Data Mining • Definition • General Concept • Foundations • Evolution • Applications • Challenges • Algorithms • Classical • Next Generations
Introduction to Data Mining • What is Data Mining? • Data mining is the process for the non-trivial extraction of implicit, previously unknown and potentially useful information from data stored in repositories using pattern recognition technologies as well as statistical and mathematical methods.
Introduction to Data Mining • Foundations • Massive data collection • Powerful multiprocessor computers • Data mining algorithms
Introduction to Data Mining • Evolution
Introduction to Data Mining • Applications • Industry • Retails • Health maintenance group • Telecommunications • Credit card • Web mining • Sports and entertainment solutions
Introduction to Data Mining • Challenges • Ability to handle different types of data • Graceful degeneration of data mining algorithms • Valuable data mining results • Representation of data mining requests and results • Mining at different abstraction levels • Mining information from different sources of data • Protection of privacy and data security
Introduction to Data Mining • Hierarchy of Choices and Decisions • Business goal • Collecting, cleaning and preparing data • Prediction • Model type and algorithms
Introduction to Data Mining • Data Description • Descriptions of data characteristics in elementary and aggregated form • Summarization • Visualization
Introduction to Data Mining • Predictive Data Mining • Predictive modeling is a term used to describe the process of mathematically or mentally representing a phenomenon or occurrence with a series of equations or relationships.
Introduction to Data Mining • Prediction: Classification • Classification predicts class membership • Pre-classify (using classification algorithms) • Test to determine the quality of the model • Predict (using effective classifier)
Introduction to Data Mining • Prediction: Regression • Regression takes a numerical dataset and develops a mathematical formula that fits the data. • When you're ready to use the results to predict future behavior, you simply take your new data, plug it into the developed formula and you get a prediction!
Introduction to Data Mining • Algorithms • Classical Techniques • Statistics • Neighborhoods • Clustering • Next Generations • Decision Tree • Neural Network • Rule Induction
Introduction to Data Mining • Statistics • Classical Statistics: • Related to the collection and description of data • Believes: there exists an underlying pattern of data distribution • Objective: find the best guess • Data Mining: • Employs statistical methods • Needs to analyze huge amounts of data • Beyond traditional statistics
Introduction to Data Mining • Neighborhoods • Basic idea: • For a new problem, look for the similar problems (neighborhoods) that have been solved • Key point: find the neighborhood • Calculate the distance: how far is good to be considered as a neighbor? • Which class the new problem belong to? • Large computational load: • New calculation for each new case
Introduction to Data Mining • Clustering • Elements grouped together according to different characteristics • Every cluster share same values (homogenous) • Problem: Control the number of cluster • Hierarchical clustering: flexibility • Non-hierarchical clustering: given by user • Used most frequently for: • Consolidating data into a high-level of view • Group records into likely behaviors
Introduction to Data Mining • Decision Tree • A way of representing a series of rules that lead to a class or value • Structure: • Decision node, branches, leaves • Example: A loan officer wants to determine the credit of applicants
Introduction to Data Mining • Decision Tree (continued) • Help to induce the tree and its rules to make predictions
Introduction to Data Mining • Neural Networks • Efficiently modeling large and complex problems with hundreds of predictor variables • Structure: • Input layer, hidden layer, output layer • Activation function between nodes • Requires training and testing of relations
Introduction to Data Mining • Neural Networks (continued) • Example:
Introduction to Data Mining • Rule Induction • A method to derive a set of rules to classify cases • For example, rule induction can be used to discover patterns relating decisions (e.g., credit card application) • Rules may not cover all possible situations
Introduction to Data Mining Questions