320 likes | 436 Views
Learning From Data. Chichang Jou Tamkang University. Chapter Objectives. Analyze the general model of inductive learning Explain how to select an approximating function Introduce risk functional for regression and classification problems Identify concepts in statistical learning theory
E N D
Learning From Data Chichang Jou Tamkang University
Chapter Objectives • Analyze the general model of inductive learning • Explain how to select an approximating function • Introduce risk functional for regression and classification problems • Identify concepts in statistical learning theory • Discuss the differences of inductive principles, empirical risk minimization, and structural risk minimization • Discuss practical aspects of VC dimension • Compare inductive learning tasks using graphics • Introduce validation methods of inductive learning results
Background • Biological systems learn to cope with the unknown, statistical environment in a data-driven fashion • Two-phases of predictive-learning process: • Learning or estimating unknown dependencies • Induction: progressing from particular cases to a model • Using estimated dependencies to predict • Deduction: progressing from a model and given input to particular cases
Induction, Deduction, Transduction Local estimation, like association rules
4.1 Learning machine • Machine learning algorithms vary in their goals, in the available training data sets, and in the learning strategies and representation of data • Inductive machine learning • A generalization of models is obtained from a set of samples
Observational setting of a Learning machine Real-worlds systems often have un-measured inputs Conditional probability p(Y/X)
Inductive Learning machine • Try to form generalizations from particular true facts (called training data set). • Formalized as a set of functions that approximate a system’s behavior • Given X as an input, implementing a set of functions f(X, w), w is a parameter of the function • Its solution requires a priori knowledge
Inductive Learning machine • The task of inductive inference • Given samples (xi, f(xi)), return a function h(x), called hypothesis, that approximate f(x) linear non-linear
Inductive Learning machine • Statistical dependency vs. causality • Inductive-learning processes build the model of dependencies, but they should not be automatically interpreted as causality relations • Example: people in Florida are on average older than in other states. Married mnn live longer than single men.
Loss function and Risk function • L(y, f(X,w)) • Measures the difference between y and f(X,w) • Induction learning is the process of estimating f(X,wopt), which minimizes R(w)
Inductive principle • An inductive principle is a general prescription (what to do with the data) for obtaining an estimate f(X, wopt*) • Human intervention inthe learning algorithm • Selection of input and output variables • Data encoding and representation • Incorporating a priori knowledge • Influence over the generator of the sampling rate or distribution
4.2 Statistical Learning Method • A formalized theory for finite-sample inductive learning, mainly for classification or pattern recognition • Provide quantitative description of the trade-off between model complexity and the available information • Also called VC (Vapnik-Chervonenkis) theory • Other approaches are more engineering-oriented, without proofs and formalizations
Empirical risk minimization (ERM) Typically used when the model is given or approximated first, and then its parameters are estimated from the data
Empirical risk minimization (ERM) • The consistency property • Minimizing one risk for a given data set will also minimize the other risk • Nontrivial Consistency • Consistency requirement must hold for all approximating functions
Behavior of the Growth function G(n) Approximating functions in the form of G(n) will have a consistency property
Structural Risk Minimization (SRM) • ERM is good when n/h is large • When n/h < 20, use SRM • Selecting an element of a structure having optimal complexity • Estimating the model based on the set of approximating functions defined in the selected element of the structure
SRM • Applications of SRM for non-linear approximations are difficult, impossible in many cases • use heuristics, like early stopping rules and weight initialization • Three optimization approaches • Stochastic approximation (gradient descent) • Iterative methods • Greedy optimization
SRM • Problems with the optimization approaches • Too sensitive to initial conditions • Too sensitive to stopping rules • Too sensitive to many local minima • Two useful guidelines • Do not attempt to solve a problem by indirectly solving a harder general problem • Occam’s razor: The best performance is provided by a model of optimal complexity
Types of Learning Methods Examples: logistic regression, multilayered perception, decision rules, decision trees, etc. Emphasis on a task-independent measure of quality of representation. Examples: cluster analysis, artificial neural network, association rules
Common Learning Tasks • Classification • Regression • Clustering • Summarization (Formalized Description) • Dependency-modeling • Deviation Detection (Outlier, Changes in time)
Data-mining and Knowledge-discovery techniques • Statistical Methods • Cluster Analysis • Decision Trees and Decision Rules • Association Rules • Artificial Neural Network • Genetic Algorithms • Fuzzy Inference Systems • N-dimensional Visualization Methods
Common Resampling Methods • Resubstitution Method • Holdout Method • Leave-one-out Method • Rotation Method • Bootstrap Method
Error rate, Accuracy • R= E / S • A = 1 – R = (S – E) / S • Two classes • False Negative: False Reject Rate (FRR) • False Positive: False Acceptance Rate (FAR) • More than two classes • Confusion matrix
Receiver Operating Characteristic (ROC) Curve • To evaluate FAR and FRR at the same time • The following ROC shows sensitivity (FAR) vs. 1-specificity (1-FRR) FAR