330 likes | 447 Views
Commonly Used Classification Techniques and Recent Developments. Presented by Ke-Shiuan Lynn. Input Vector (Feature). Output (Class). Classifier. Terminology.
E N D
Commonly Used Classification Techniques and Recent Developments Presented by Ke-Shiuan Lynn
Input Vector (Feature) Output (Class) Classifier Terminology A classifier can be viewed as a function of block. A classifier assigns one class to each point of the input space. The input space is thus partitioned into disjoint subsets, called decision regions, each associated with a class.
Decision regions Decision boundaries Input #1 Inputs of class A Inputs of class B Input #2 Terminology (cont.) The way a classifier classifies inputs is defined by its decision regions. The borderlines between decision regions are called decision-region boundaries or simply decision boundaries.
Terminology (cont.) In practice, input vectors of different classes are rarely so neatly distinguishable. Samples of different classes may have same input vectors. Due to such a uncertainty, areas of input space can be clouded by a mixture of samples of different classes. Input #1 Input #2
Terminology (cont.) • The optimal classifier is the one expected to produce the least number of misclassifications. • Such misclassifications are due to uncertainty in the problem rather than a deficiency in the decision regions. Input #1 Input #2
Input #2 Input #1 Terminology (cont.) • A designed classifier is said to generalize well if the classifier achieves similar classification accuracy to both training samples and real world samples
Types of Models • Decision-Region Boundaries • Probability Density Functions • Posterior Probabilities
Decision-Region Boundaries • This type of model defines decision regions by explicitly constructing boundaries in the input space. • These models attempt to minimize the number of expected misclassifications by placing boundaries appropriately in the input space.
Probability Density Functions (PDFs) • The models of this type attempt to construct a probability density function,p(x|C), that maps a point x in the input space to class C. • Prior probabilities,p(C), is to be estimated from the given database. • This model assigns the most probable class to an input vector x by selecting the class maximizingp(C)p(x|C).
Posterior Probabilities • Let there be m possible classes denoted C1, C2, …, Cm. This type of models attempts to generate m posterior probabilities p(Ci|x), i=1, 2, …, m for any input vector x. • The classification is made in the way that the input vector is assigned to the class associated with maximal p(Ci|x).
Approaches to Modeling • Fixed models • Parametric models • Nonparametric models
Fixed models Fixed model is used when the exact input-output relationship is known. • Decision region boundary: A known threshold value (e.g. A particular BMI value for defining obesity) • PDF: When each class’s PDF can be obtained • Posterior probability: when the probability that any observation belongs to each class is know.
Parametric Models • Parametric model is used when its parametric mathematical form can be obtained. • The development process of such models consists of two stages: (1) derive an appropriate parametric form, and (2) tune the parameters to fit data.
Parametric Models (cont.) • Decision-region boundary: Linear discriminant function e.g. y=ax1+bx2+cx3+d • PDF: Multivariate Gaussian function • Posterior probability: Logistic regression
Nonparametric Models • Nonparametric model is used when the relationships between input vectors and their associated classes are not well understood. • Models of varying smoothness and complexity are generated and the one with best generalization is chosen.
Nonparametric Models (cont.) • Decision-region boundary: Learning Vector Quantization (LVQ), K nearest neighbor classifier, decision tree. • PDF: Gaussian mixture methods, Pazen’s window. • Posterior probability: Artificial neural network (ANN), radial basis function (RBF), group method of data handling (GMDH)
Practical Constraints • Memory usage • Training time • Classification time
Comparison of Algorithms Linear regression: y = w0+w1x1+w2x2 +…+wNxN Logistic regression: • Linear and Logistic regressions both tend to explicitly construct the decision-region boundaries. • Advantages: Easy implementation, easy explanation of input-output relationship • Disadvantages: Limited complexity on the constructed boundary
Comparison of Algorithms (cont) Root Binary decision tree: • Binary and Linear decision trees also tend to explicitly construct the decision-region boundaries. • Advantages: Easy implementation, easy explanation of input-output relationship • Disadvantages: Limited complexity on the constructed boundary, the tree structure may not be global optimal. xi>=c1 xi<c1 xj>=c2 xj<c2 xk>=c3 xk<c3
Comparison of Algorithms (cont) Neural Network: • Feedforward neural network and radial-basis function network both tend to implicitly construct the decision-region boundaries. • Advantages: They can both approximate any complex decision boundaries provided that enough nodes are used. • Disadvantages: Long training time
Comparison of Algorithms (cont) • Supporting vector machine • Supporting vector machine also tends to implicitly construct the decision-region boundaries. • Advantages: This type of classifier has been shown to have good generalization capability.
Comparison of Algorithms (cont) Bay’s Rule: Unimodal Gaussian: • Unimodal Gaussian explicitly construct the PDF, compute the prior probability P(Cj) and posterior probability P(Cj|X). • Advantages: Easy implementation, confidence level can be obtained from the posterior probabilities. • Disadvantages: Sample distributions may not be Gaussian.
Comparison of Algorithms (cont) • Gaussian mixture modify unimodal Gaussian in the way that the PDF is estimated by a weighted average of multiple Gaussian. • Similar to Gaussian mixture Parzen’s windows approximate PDF using weighted average of radial Gaussian. • Advantage: Given enough Gaussian components, the above architectures can approximate arbitrary complex distributions
Comparison of Algorithms (cont) K nearest neighbor classifier • K nearest neighbor tends to construct posterior probabilities P(Cj|X) • Advantage: No training is required, confidence level can be obtained • Disadvantage: classification accuracy is low is complex decision-region boundary exists, large storage required.
Other Useful Classifiers • Projection Pursuit: aims to decomposing the task of high-dimensional modeling into a sequence of low-dimensional modeling. • This algorithm consists of two stage: the first stage projects the input data onto a one-dimensional space while the second stage construct the mapping from projected space to the output space.
Other Useful Classifiers (cont) • Multivariate adaptive regression splines (MARS) tends to approximate the decision-region boundaries in two stages. • At the first stage, the algorithm partitions the state space into small portions. • At the second stage, the algorithm construct a low-order polynomial to approximate the decision-region boundary within each partition. • Disadvantage: This algorithm is intractable for problem with high (> 10) dimensional inputs
Other Useful Classifiers (cont) • Group method of data handling (GMDH) also aims to approximate the decision-region boundaries using high-order polynomial functions. • The modeling process begins with a low order polynomial, and then iteratively combines terms to produce a higher order polynomial until the modeling accuracy saturates.
Keep The Following In Mind • Use multiple algorithms without bias and let your specific data help determine which model is best suited for your problem. • Occam’s Razor: Entities should not be multiplied unnecessarily -- "when you have two competing models which make exactly the same predictions to the data, the one that is simpler is the better."