620 likes | 656 Views
Artificial Intelligence. Machine Learning, Pattern Recognition, Data Mining. Dae-Won Kim. School of Computer Science & Engineering Chung-Ang University. AI Scope. 1. Search-based optimization techniques for real-life problems Hill climbing, Branch and bound, A*, Greedy algorithm
E N D
Artificial Intelligence Machine Learning, Pattern Recognition, Data Mining Dae-Won Kim School of Computer Science & Engineering Chung-Ang University
AI Scope 1. Search-based optimization techniques for real-life problems • Hill climbing, Branch and bound, A*, Greedy algorithm • Simulated annealing, Tabu search, Genetic algorithm 2. Reasoning: Logic, Inference, and knowledge representation • Logical language: Syntax and Semantics • Inference algorithm: Forward/Backward chaining, Resolution, and Expert System 3. Machine Learning/ Pattern Recognition/ Data Mining • Classification: Bayesian algorithm, Nearest-neighbor algorithm, Neural network • Clustering: Hierarchical algorithm, K-Means algorithm • 4. Uncertainty based on Probability theory • 5. Planning, Scheduling, Robotics, and Industry Automation
Progress in digital data acquisition and storage technology has resulted in the growth of huge databases.
Data mining is the extraction of implicit, previously unknown, and potentially useful information from data.
We build algorithms that sift through databases automatically, seeking patterns.
Strong patterns, if found, will likely generalize to make accurate predictions on future data.
Algorithms need to be robust enough to cope with imperfect data and to extract patterns that are inexact useful.
Machine learning provides the technical basis of data mining.
We will study simple machine learning methods, looking for patterns in data.
People has been seeking patterns in data since human life began. e.g., Samsung Galaxy: Samsung Pay, Managers in Samsung want to find consuming patterns of users so that it’d provide personalized services.
In data mining, computer algorithm is solving problems by analyzing data in databases.
Data mining is defined as the process of (knowledge) discovering patterns in data.
Data mining is defined as the process of (knowledge) discovering patterns in data.
We have 100 fishes, and measured their lengths. (e.g., fish: x=[length]t)
Our algorithm can measure the length of a new fish, and estimate its label.
Yes, it is a typical prediction task through classification technique. But, it is often inexact and unsatisfactory.
Next, we measured their lightness. (e.g., fish: x=[lightness])
Let us use both lightness and width. (e.g., fish: x=[lightness, width])
Each fish is represented a point (vector) in 2D x-y coordinate space.
Everything is represented as N-dimensional vector in coordinate space.
We assume that you have learned the basic concepts of linear algebra.
The objective is to find a line that effectively separates two groups.
We can build a complex nonlinear line to provide exact separation.
This shows a predictive task of data mining, often called as pattern classification/ recognition/ prediction.
The act of taking in raw data and making an action based on the category of the pattern.
Q: How to represent and classify texts? • Opinion mining • Sentiment analysis
Another famous task of data mining is a descriptive task. Cluster analysis is the well-known group discovery algorithm.
We will experience the basic issues in the prediction task (pattern classification) in forthcoming weeks.
Given training data set : ‘n x d’ pattern/data matrix: ‘d’ features (attributes, variables, dimensions, fields) ‘n’ patterns (objects, observations, vectors, records)
Classification • General description • Supervised pattern classification • Labeled training patterns, the groups are known a priori • Constructs rules for classifying new data into the known groups • Specific terms • Pattern=object=observation is represented as a feature vector • Distance measure for numeric and categorical data • Training set (answer database) and test set (new observation) • Prediction performance by accuracy, sensitivity, specificity, … • ex) Bayesian classifier, Nearest-neighbor classifier, SVM, NN, LDA, …
The training pattern matrix is stored in a file or database.
Given labeled training patterns, the class groups are known a priori.