150 likes | 192 Views
Explore the comparison between Linear Discriminant Analysis (LDA) and Logistic Regression (LR) for classification problems, focusing on decision boundaries and model fitting techniques. Understand the advantages and differences between these two popular linear classifiers.
E N D
Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan
Introduction - problem and solution LDA - Linear Discriminant Analysis LR : Logistic Regression (Linear Regression) LDA Vs. LR In a word – Separating Hyperplanes Outline
Introduction - the problem Group k Posteriori Pj=P(G=j|X=x) Observation X Or Group l? *We can think of G as “group label”
Introduction - the solution Linear Decision boundary: pk>plchoose K pk=pl pl>pkchoose L
Linear Discriminant Analysis • Let P(G = k) = k and P(X=x|G=k) = fk(x) • Then by bayes rule: • Decision boundary:
Linear Discriminant Analysis • Assuming fk(x) ~ gauss(k, k) and 1 =2 = …=K= • We get Linear (in x) decision boundary • For not common we get QDA (RDA)
Linear Discriminant Analysis • Using empirical estimation methods: Top classifier (Michie et al., 1994) – the data supports linear boundaries, stability
Logistic Regression • Models posterior prob. Of K classes; they sum to one and remain in [0,1]: • Linear Decision boundary:
Logistic Regression • Model fit: • In max. ML Newton-Raphson algorithm is used
Linear Regression • Recall the common features of multivariate regression: • +Lack of multicollinearity etc. • Here: Assuming N instances (N*p observation matrix X), Y is a N*K indicator response matrix (K classes).
LDA Vs. LR • Similar results, LDA slightly better (56% vs. 67% error rate for LR) • Presumably, they are identical because of the linear end-form of decision boundaries (return to see).
LDA Vs. LR • LDA: parameters fit by max. full log-likelihood based on the joint density which assumes Gaussian density (Efron 1975 – worst case of ignoring gaussianity 30% eff. reduction) • Linearity is derived LR: P(X) arbitrary (advantage in model selection and abitility to absorb extreme X values), fits parameters of P(G|X) by maximizing the conditional likelihood. Linearity is assumed