Learning From Data

Learning From Data Chichang Jou Tamkang University

Chapter Objectives • Analyze the general model of inductive learning • Explain how to select an approximating function • Introduce risk functional for regression and classification problems • Identify concepts in statistical learning theory • Discuss the differences of inductive principles, empirical risk minimization, and structural risk minimization • Discuss practical aspects of VC dimension • Compare inductive learning tasks using graphics • Introduce validation methods of inductive learning results

Background • Biological systems learn to cope with the unknown, statistical environment in a data-driven fashion • Two-phases of predictive-learning process: • Learning or estimating unknown dependencies • Induction: progressing from particular cases to a model • Using estimated dependencies to predict • Deduction: progressing from a model and given input to particular cases

Induction, Deduction, Transduction Local estimation, like association rules

4.1 Learning machine • Machine learning algorithms vary in their goals, in the available training data sets, and in the learning strategies and representation of data • Inductive machine learning • A generalization of models is obtained from a set of samples

Observational setting of a Learning machine Real-worlds systems often have un-measured inputs Conditional probability p(Y/X)

Inductive Learning machine • Try to form generalizations from particular true facts (called training data set). • Formalized as a set of functions that approximate a system’s behavior • Given X as an input, implementing a set of functions f(X, w), w is a parameter of the function • Its solution requires a priori knowledge

Inductive Learning machine • The task of inductive inference • Given samples (xi, f(xi)), return a function h(x), called hypothesis, that approximate f(x) linear non-linear

Inductive Learning machine • Statistical dependency vs. causality • Inductive-learning processes build the model of dependencies, but they should not be automatically interpreted as causality relations • Example: people in Florida are on average older than in other states. Married mnn live longer than single men.

Loss function and Risk function • L(y, f(X,w)) • Measures the difference between y and f(X,w) • Induction learning is the process of estimating f(X,wopt), which minimizes R(w)

Common Loss function

Inductive principle • An inductive principle is a general prescription (what to do with the data) for obtaining an estimate f(X, wopt*) • Human intervention inthe learning algorithm • Selection of input and output variables • Data encoding and representation • Incorporating a priori knowledge • Influence over the generator of the sampling rate or distribution

4.2 Statistical Learning Method • A formalized theory for finite-sample inductive learning, mainly for classification or pattern recognition • Provide quantitative description of the trade-off between model complexity and the available information • Also called VC (Vapnik-Chervonenkis) theory • Other approaches are more engineering-oriented, without proofs and formalizations

Empirical risk minimization (ERM) Typically used when the model is given or approximated first, and then its parameters are estimated from the data

Empirical risk minimization (ERM) • The consistency property • Minimizing one risk for a given data set will also minimize the other risk • Nontrivial Consistency • Consistency requirement must hold for all approximating functions

Behavior of the Growth function G(n) Approximating functions in the form of G(n) will have a consistency property

Structural Risk Minimization (SRM) • ERM is good when n/h is large • When n/h < 20, use SRM • Selecting an element of a structure having optimal complexity • Estimating the model based on the set of approximating functions defined in the selected element of the structure

SRM in practice

SRM • Applications of SRM for non-linear approximations are difficult, impossible in many cases • use heuristics, like early stopping rules and weight initialization • Three optimization approaches • Stochastic approximation (gradient descent) • Iterative methods • Greedy optimization

SRM • Problems with the optimization approaches • Too sensitive to initial conditions • Too sensitive to stopping rules • Too sensitive to many local minima • Two useful guidelines • Do not attempt to solve a problem by indirectly solving a harder general problem • Occam’s razor: The best performance is provided by a model of optimal complexity

Requirement of any inductive-learning process

Types of Learning Methods Examples: logistic regression, multilayered perception, decision rules, decision trees, etc. Emphasis on a task-independent measure of quality of representation. Examples: cluster analysis, artificial neural network, association rules

Common Learning Tasks • Classification • Regression • Clustering • Summarization (Formalized Description) • Dependency-modeling • Deviation Detection (Outlier, Changes in time)

Data-mining and Knowledge-discovery techniques • Statistical Methods • Cluster Analysis • Decision Trees and Decision Rules • Association Rules • Artificial Neural Network • Genetic Algorithms • Fuzzy Inference Systems • N-dimensional Visualization Methods

4.5 Model Estimation

Testing

Objective of Testing

How to Split Samples

Common Resampling Methods • Resubstitution Method • Holdout Method • Leave-one-out Method • Rotation Method • Bootstrap Method

Error rate, Accuracy • R= E / S • A = 1 – R = (S – E) / S • Two classes • False Negative: False Reject Rate (FRR) • False Positive: False Acceptance Rate (FAR) • More than two classes • Confusion matrix

Confusion matrix for three classes

Receiver Operating Characteristic (ROC) Curve • To evaluate FAR and FRR at the same time • The following ROC shows sensitivity (FAR) vs. 1-specificity (1-FRR) FAR

Learning From Data

Learning From Data

Presentation Transcript

Learning Bayesian Networks from Data

Self-taught Learning Transfer Learning from Unlabeled Data

LEARNING FROM DATA

Optimization-Neural Networks Learning from Data

Predictive Learning from Data

Learning From the Data

Predictive Learning from Data

Statistical Learning from Relational Data

Learning Bayesian Networks from Data

Learning Classifiers from Distributional Data

Inductive Learning from Imbalanced Data Sets

LEARNING FROM NOISY DATA

Learning from Partially Labeled Data

Predictive Learning from Data

Predictive Learning from Data

Predictive Learning from Data