150 likes | 192 Views
Linear Methods for Classification. 20.04.2015: Presentation for MA seminar in statistics Eli Dahan. Introduction - problem and solution LDA - Linear Discriminant Analysis LR : Logistic Regression (Linear Regression) LDA Vs. LR In a word – Separating Hyperplanes. Outline.
E N D
Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan
Introduction - problem and solution LDA - Linear Discriminant Analysis LR : Logistic Regression (Linear Regression) LDA Vs. LR In a word – Separating Hyperplanes Outline
Introduction - the problem Group k Posteriori Pj=P(G=j|X=x) Observation X Or Group l? *We can think of G as “group label”
Introduction - the solution Linear Decision boundary: pk>plchoose K pk=pl pl>pkchoose L
Linear Discriminant Analysis • Let P(G = k) = k and P(X=x|G=k) = fk(x) • Then by bayes rule: • Decision boundary:
Linear Discriminant Analysis • Assuming fk(x) ~ gauss(k, k) and 1 =2 = …=K= • We get Linear (in x) decision boundary • For not common we get QDA (RDA)
Linear Discriminant Analysis • Using empirical estimation methods: Top classifier (Michie et al., 1994) – the data supports linear boundaries, stability
Logistic Regression • Models posterior prob. Of K classes; they sum to one and remain in [0,1]: • Linear Decision boundary:
Logistic Regression • Model fit: • In max. ML Newton-Raphson algorithm is used
Linear Regression • Recall the common features of multivariate regression: • +Lack of multicollinearity etc. • Here: Assuming N instances (N*p observation matrix X), Y is a N*K indicator response matrix (K classes).
LDA Vs. LR • Similar results, LDA slightly better (56% vs. 67% error rate for LR) • Presumably, they are identical because of the linear end-form of decision boundaries (return to see).
LDA Vs. LR • LDA: parameters fit by max. full log-likelihood based on the joint density which assumes Gaussian density (Efron 1975 – worst case of ignoring gaussianity 30% eff. reduction) • Linearity is derived LR: P(X) arbitrary (advantage in model selection and abitility to absorb extreme X values), fits parameters of P(G|X) by maximizing the conditional likelihood. Linearity is assumed