1 / 53

Logistic Regression

Logistic Regression. Jia-Bin Huang Virginia Tech. ECE-5424G / CS-5824. Spring 2019. Administrative. Please start HW 1 early! Questions are welcome!. Two principles for estimating parameters. Maximum Likelihood Estimate (MLE) Choose that maximizes probability of observed data

jamesadams
Download Presentation

Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression Jia-Bin Huang Virginia Tech ECE-5424G / CS-5824 Spring 2019

  2. Administrative • Please start HW 1 early! • Questions are welcome!

  3. Two principles for estimating parameters • Maximum Likelihood Estimate (MLE) Choose that maximizes probability of observed data • Maximum a posteriori estimation (MAP)Choose that is most probable given prior probability and data Slide credit: Tom Mitchell

  4. Naïve Bayes classifier • Want to learn • But require parameters... • How about applying Bayes rule? • : Need parameters • :Need 1 parameter • Apply conditional independence assumption • : Need parameters

  5. Naïve Bayes classifier • Bayes rule: • Assume conditional independence among ’s: • Pick the most probable Y Slide credit: Tom Mitchell

  6. Example • Estimating parameters • Test example: • : • : Conditional indep. Bayes rule 0.6

  7. Naïve Bayes algorithm – discrete • For each value Estimate For each value of each attribute Estimate • Classify Slide credit: Tom Mitchell

  8. Estimating parameters: discrete • Maximum likelihood estimates (MLE) Slide credit: Tom Mitchell

  9. F = 1 iff you live in Fox Ridge • S = 1 iff you watched the superbowl last night • D = 1 iff you drive to VT • G = 1 iff you went to gym in the last month

  10. Naïve Bayes: Subtlety #1 • Often the are not really conditionally independent • Naïve Bayes often works pretty well anyway • Often the right classification, even when not the right probability [Domingos & Pazzani, 1996]) • What is the effect on estimated ? • What if we have two copies: Slide credit: Tom Mitchell

  11. Naïve Bayes: Subtlety #2 MLE estimate for might be zero. (for example, = birthdate. = Feb_4_1995) • Why worry about just one parameter out of many? • What can we do to address this? • MAP estimates (adding “imaginary” examples) Slide credit: Tom Mitchell

  12. Estimating parameters: discrete • Maximum likelihood estimates (MLE) • MAP estimates (Dirichlet priors): Slide credit: Tom Mitchell

  13. What if we have continuous • Gaussian Naïve Bayes (GNB): assume • Additional assumption on : • Is independent of () • Is independent of () • Is independent of and () Slide credit: Tom Mitchell

  14. Naïve Bayes algorithm – continuous • For each value Estimate For each attribute estimate Class conditional mean , variance • Classify Slide credit: Tom Mitchell

  15. Things to remember • Probability basics • Conditional probability, joint probability, Bayes rule • Estimating parameters from data • Maximum likelihood (ML) maximize • Maximum a posteriori estimation (MAP) maximize • Naive Bayes

  16. Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification

  17. Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification

  18. 1 (Yes) • Threshold classifier output at 0.5 • If predict “” • If , predict “” Malignant? 0 (No) Tumor Size Slide credit: Andrew Ng

  19. Classification: or (from linear regression) can be or Logistic regression: Logistic regression is actually for classification Slide credit: Andrew Ng

  20. Hypothesis representation • Want • where • Sigmoid function • Logistic function Slide credit: Andrew Ng

  21. Interpretation of hypothesis output • estimated probability that on input • Example: If • 0.7 • Tell patient that 70% chance of tumor being malignant Slide credit: Andrew Ng

  22. Logistic regression Suppose predict “y = 1” if predict “y = 0” if Slide credit: Andrew Ng

  23. Decision boundary E.g., • Predict “” if Age Tumor Size Slide credit: Andrew Ng

  24. E.g., • Predict “” if Slide credit: Andrew Ng

  25. Where does the form come from? • Logistic regression hypothesis representation • Consider learning f: , where • is a vector of real-valued features • is Boolean • Assume all are conditionally independent given • Model as Gaussian • Model as Bernoulli What is ? Slide credit: Tom Mitchell

  26. Applying Bayes rule Divide by Apply Plug in Slide credit: Tom Mitchell

  27. Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification

  28. Training set with examples , How to choose parameters ? Slide credit: Andrew Ng

  29. Cost function for Linear Regression Slide credit: Andrew Ng

  30. Cost function for Logistic Regression Slide credit: Andrew Ng

  31. Logistic regression cost function • : • : Slide credit: Andrew Ng

  32. Logistic regression Learning: fit parameter Prediction: given new Output Slide credit: Andrew Ng

  33. Where does the costcome from? • Training set with examples • Maximum likelihood estimate for parameter • Maximum conditional likelihood estimate for parameter Slide credit: Tom Mitchell

  34. Goal: choose to maximize conditional likelihood of training data • Training data • Data likelihood • Data conditional likelihood Slide credit: Tom Mitchell

  35. Expressing conditional log-likelihood )

  36. Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification

  37. Gradient descent Goal: Repeat { } Good news: Convex function! Bad news: No analytical solution (Simultaneously update all ) Slide credit: Andrew Ng

  38. Gradient descent Goal: Repeat { } (Simultaneously update all ) Slide credit: Andrew Ng

  39. Gradient descent for Linear Regression Repeat { } Gradient descent for Logistic Regression Repeat { } Slide credit: Andrew Ng

  40. Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification

  41. How about MAP? • Maximum conditional likelihood estimate (MCLE) • Maximum conditional a posterior estimate (MCAP)

  42. Prior • Common choice of : • Normal distribution, zero mean, identity covariance • “Pushes” parameters towards zeros • Corresponds to Regularization • Helps avoid very large weights and overfitting Slide credit: Tom Mitchell

  43. MLE vs. MAP • Maximum conditional likelihood estimate (MCLE) • Maximum conditional a posterior estimate (MCAP)

  44. Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification

  45. Multi-class classification • Email foldering/taggning: Work, Friends, Family, Hobby • Medical diagrams: Not ill, Cold, Flu • Weather: Sunny, Cloudy, Rain, Snow Slide credit: Andrew Ng

  46. Binary classification Multiclass classification

  47. One-vs-all (one-vs-rest) Class 1: Class 2: Class 3: Slide credit: Andrew Ng

  48. One-vs-all • Train a logistic regression classifier for each class to predict the probability that • Given a new input , pick the class that maximizes Slide credit: Andrew Ng

  49. Discriminative Approach Ex: Logistic regression Estimate directly (Or a discriminant function: e.g., SVM) Prediction Generative Approach Ex: Naïve Bayes Estimate and Prediction

More Related