670 likes | 718 Views
Machine Learning: Regression or Classification?. J.-S. Roger Jang ( 張智星 ) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University. GHW Dataset. GHW: Gender-height-weight Source A table of 10000 entries with 3 columns Gender: Categorical Height: Numerical
E N D
Machine Learning:Regression or Classification? J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University
GHW Dataset • GHW: Gender-height-weight • Source • A table of 10000 entries with 3 columns • Gender: Categorical • Height: Numerical • Weight: Numerical
Multiple Ways of Prediction • Multiple ways of prediction based on GHW dataset • Classification: Height, weight gender • Regression: Gender, height weight • Regression: Gender, weight height • In general • Classification: the output is categorical • Regression: the output is numerical • Sometimes it’s not so obvious!
NBC for Iris Dataset (1/3) Scatter plot of Iris dataset (with only the last two dim.) PDF on each features and each class
NBC for Iris Dataset (2/3) PDF for each class
NBC for Iris Dataset (3/3) Decision boundaries (using the last 2 features)
Strength and Weakness of NBC Quiz! • Strength • Fast computation during training and evaluation • Robust than QC • Weakness • No able to deal with bi-modal data correctly • Class boundary not as complex as QC
K-Nearest Neighbor Classifiers(KNNC) J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University
Concept of KNNC Quiz! : class-A point : class-B point : point with unknown class Feature 2 3 nearest neighbors The point is classified as B via 3NNC. • Concept: 近朱者赤、近墨者黑 • Steps: • Find the first k nearest neighbors of a given point. • Determine the class of the given point by voting among k nearest neighbors. Feature 1
Flowchart for KNNC General flowchart of PR: KNNC: Feature extraction From raw data to features Model construction Clustering (optional) Model evaluation KNNC evaluation on test dataset
Decision Boundary for 1NNC • Voronoi diagram: piecewise linear boundary Quiz!
Characteristics of KNNC • Strengths of KNNC • Intuitive • No computation for model construction • Weakness of KNNC • Massive computation required when dataset is big • No straightforward way • To determine the value of K • To rescale the dataset along each dimension Quiz!
Preprocessing/Variants for KNNC Quiz! • Preprocessing: • Z normalization to have zero mean and unit variance along each feature • Value of K obtained via trials and errors • Variants: • Weighted votes • Nearest prototype classification • Edited nearest neighbor classification • k+k-nearest neighbor
Demos by Cleve Moler • Cleve’s Demos of Delaunay triangles and Voronoi diagram • books/dcpr/example/cleve/vshow.m
1NNC Decision Boundaries • 1NNC Decision boundaries
Using Prototypes in KNNC • No. of prototypes for each class is 4.
Decision Boundaries of Different Classifiers Quadratic classifier 1NNC classifier Naive Bayes classifier
Naive Bayes Classifiers(NBC) J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University
Assumptions & Characteristics • Assumptions: • Statistical independency between features • Statistical independency between samples • Each feature governed by a feature-wise parameterized PDF (usually a 1D Gaussian) • Characteristics • Simple and easy (That’s why it’s named “naive”.) • Highly successful in real-world applications regardless of the strong assumptions
Training and Test Stages of NBC Quiz! • Training stage • Identify class PDF, as follows. • Identify feature PDF by MLE for 1D Gaussians • Class PDF is the product of all the corresponding feature PDFs • Test stage • Assign a sample to the class by taking class prior into consideration:
NBC for Iris Dataset (1/3) Scatter plot of Iris dataset (with only the last two dim.) PDF on each features and each class
NBC for Iris Dataset (2/3) PDF for each class
NBC for Iris Dataset (3/3) Decision boundaries (using the last 2 features)
Strength and Weakness of NBC Quiz! • Strength • Fast computation during training and evaluation • Robust than QC • Weakness • No able to deal with bi-modal data correctly • Class boundary not as complex as QC
Quadratic Classifiers(QC) J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University
Review: PDF Modeling • Goal: • Find a PDF (probability density function) that can best describe a given dataset • Steps: • Select a class of parameterized PDF • Identify the parameters via MLE (maximum likelihood estimate) based on a given set of sample data • Commonly used PDFs: • Multi-dimensional Gaussian PDF • Gaussian mixture models (GMM)
PDF Modeling for Classification • Procedure for classification based on PDF • Training stage: PDF modeling of each class based on the training dataset • Test stage: For each entry in the test dataset, pick the class with the max. PDF • Commonly used classifiers: • Quadratic classifiers, with n-dim. Gaussian PDF • Gaussian-mixture-model classifier, with GMM PDF
1D Gaussian PDF Modeling 1D Gaussian PDF: MLE of m (mean) and s2 (variance)
D-dim. Gaussian PDF Modeling D-dim Gaussian PDF: MLE of m (mean) and S (covariance matrix)
2D Gaussian PDF Contours PDF Bivariate normal distribution:
Training and Test Stages of QC Quiz! • Training stage • Identify the Gaussian PDF of each class via MLE • Test stage • Assign a sample point to the class C by taking class prior into consideration:
Characteristics of QC Quiz! • If each class is modeled by an Gaussian PDF, the decision boundary between any two classes is a quadratic function. • That is why it is called quadratic classifier. • How to prove it?
QC on Iris Dataset (1/3) • Scatter plot of Iris dataset (with only the last two dim.)
QC on Iris Dataset (2/3) • PDF for each class:
QC on Iris Dataset (3/3) • Decision boundaries among classes:
Strength and Weakness of QC Quiz! • Strength • Easy computation when the dimension d is small • Efficient way to compute leave-one-out cross validation • Weakness • The covariance matrix (d by d) is big when the dimension d is median large • The inverse of the covariance matrix may not exist • Cannot handle bi-modal data correctly
Modified Versions of QC Quiz! • How to modify QC such that it won’t deal with a big covariance matrix? • Make the covariance matrix diagonal Equivalent to naïve Bayes classifiers (proof?) • Make the covariance matrix a constant times an identity matrix
Performance Evaluation:Accuracy Estimate forClassification and Regression J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University
Introduction to Performance Evaluation • Performance evaluation: An objective procedure to derive the performance index (or figure of merit) of a given model • Typical performance indices • Accuracy • Recognition rate: ↗(望大) • Error rate: ↘(望小) • RMSE (root mean squared error): ↘(望小) • Computation load: ↘ (望小)
Synonyms • Sets of synonyms to be used interchangeably (since we are focusing on classification): • Classifiers, models • Recognition rate, accuracy • Training, design-time • Test, run-time
Performance Indices for Classifiers Quiz! • Performance indices of a classifier • Recognition rate • How to have an objective method or procedure to derive it • Computation load • Design-time computation (training) • Run-time computation (test) • Our focus • Recognition rate and the procedures to derive it • The estimated accuracy depends on • Dataset partitioning • Model (types and complexity)
Methods for Performance Evaluation • Methods to derive the recognition rates • Inside test (resubstitution accuracy) • One-side holdout test • Two-side holdout test • M-fold cross validation • Leave-one-out cross validation
Dataset Partitioning For model construction For model evaluation Dataset Training set Test set Training set Validating set For model selection For model construction • Data partitioning: to make the best use of the dataset • Training set • Training and test sets • Training, validating, and test sets Disjoint!
Why It’s So Complicated? • To avoid overfitting! • Overfitting in regression • Overfitting in classification
Inside Test (1/2) • Dataset partitioning • Use the whole dataset for training & evaluation • Recognition rate (RR) • Inside-test or resubstitution recognition rate
Inside Test (2/2) • Characteristics • Too optimistic since RR tends to be higher • For instance, 1-NNC always has an RR of 100%! • Can be used as the upper bound of the true RR. • Potential reasons for low inside-test RR: • Bad features of the dataset • Bad method for model construction, such as • Bad results from neural network training • Bad results from k-means clustering
One-side Holdout Test (1/2) • Dataset partitioning • Training set for model construction • Test set for performance evaluation • Recognition rate • Inside-test RR • Outside-test RR