540 likes | 557 Views
Logistic Regression. Jia-Bin Huang Virginia Tech. ECE-5424G / CS-5824. Spring 2019. Administrative. Please start HW 1 early! Questions are welcome!. Two principles for estimating parameters. Maximum Likelihood Estimate (MLE) Choose that maximizes probability of observed data
E N D
Logistic Regression Jia-Bin Huang Virginia Tech ECE-5424G / CS-5824 Spring 2019
Administrative • Please start HW 1 early! • Questions are welcome!
Two principles for estimating parameters • Maximum Likelihood Estimate (MLE) Choose that maximizes probability of observed data • Maximum a posteriori estimation (MAP)Choose that is most probable given prior probability and data Slide credit: Tom Mitchell
Naïve Bayes classifier • Want to learn • But require parameters... • How about applying Bayes rule? • : Need parameters • :Need 1 parameter • Apply conditional independence assumption • : Need parameters
Naïve Bayes classifier • Bayes rule: • Assume conditional independence among ’s: • Pick the most probable Y Slide credit: Tom Mitchell
Example • Estimating parameters • Test example: • : • : Conditional indep. Bayes rule 0.6
Naïve Bayes algorithm – discrete • For each value Estimate For each value of each attribute Estimate • Classify Slide credit: Tom Mitchell
Estimating parameters: discrete • Maximum likelihood estimates (MLE) Slide credit: Tom Mitchell
F = 1 iff you live in Fox Ridge • S = 1 iff you watched the superbowl last night • D = 1 iff you drive to VT • G = 1 iff you went to gym in the last month
Naïve Bayes: Subtlety #1 • Often the are not really conditionally independent • Naïve Bayes often works pretty well anyway • Often the right classification, even when not the right probability [Domingos & Pazzani, 1996]) • What is the effect on estimated ? • What if we have two copies: Slide credit: Tom Mitchell
Naïve Bayes: Subtlety #2 MLE estimate for might be zero. (for example, = birthdate. = Feb_4_1995) • Why worry about just one parameter out of many? • What can we do to address this? • MAP estimates (adding “imaginary” examples) Slide credit: Tom Mitchell
Estimating parameters: discrete • Maximum likelihood estimates (MLE) • MAP estimates (Dirichlet priors): Slide credit: Tom Mitchell
What if we have continuous • Gaussian Naïve Bayes (GNB): assume • Additional assumption on : • Is independent of () • Is independent of () • Is independent of and () Slide credit: Tom Mitchell
Naïve Bayes algorithm – continuous • For each value Estimate For each attribute estimate Class conditional mean , variance • Classify Slide credit: Tom Mitchell
Things to remember • Probability basics • Conditional probability, joint probability, Bayes rule • Estimating parameters from data • Maximum likelihood (ML) maximize • Maximum a posteriori estimation (MAP) maximize • Naive Bayes
Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification
Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification
1 (Yes) • Threshold classifier output at 0.5 • If predict “” • If , predict “” Malignant? 0 (No) Tumor Size Slide credit: Andrew Ng
Classification: or (from linear regression) can be or Logistic regression: Logistic regression is actually for classification Slide credit: Andrew Ng
Hypothesis representation • Want • where • Sigmoid function • Logistic function Slide credit: Andrew Ng
Interpretation of hypothesis output • estimated probability that on input • Example: If • 0.7 • Tell patient that 70% chance of tumor being malignant Slide credit: Andrew Ng
Logistic regression Suppose predict “y = 1” if predict “y = 0” if Slide credit: Andrew Ng
Decision boundary E.g., • Predict “” if Age Tumor Size Slide credit: Andrew Ng
E.g., • Predict “” if Slide credit: Andrew Ng
Where does the form come from? • Logistic regression hypothesis representation • Consider learning f: , where • is a vector of real-valued features • is Boolean • Assume all are conditionally independent given • Model as Gaussian • Model as Bernoulli What is ? Slide credit: Tom Mitchell
Applying Bayes rule Divide by Apply Plug in Slide credit: Tom Mitchell
Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification
Training set with examples , How to choose parameters ? Slide credit: Andrew Ng
Cost function for Linear Regression Slide credit: Andrew Ng
Cost function for Logistic Regression Slide credit: Andrew Ng
Logistic regression cost function • : • : Slide credit: Andrew Ng
Logistic regression Learning: fit parameter Prediction: given new Output Slide credit: Andrew Ng
Where does the costcome from? • Training set with examples • Maximum likelihood estimate for parameter • Maximum conditional likelihood estimate for parameter Slide credit: Tom Mitchell
Goal: choose to maximize conditional likelihood of training data • Training data • Data likelihood • Data conditional likelihood Slide credit: Tom Mitchell
Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification
Gradient descent Goal: Repeat { } Good news: Convex function! Bad news: No analytical solution (Simultaneously update all ) Slide credit: Andrew Ng
Gradient descent Goal: Repeat { } (Simultaneously update all ) Slide credit: Andrew Ng
Gradient descent for Linear Regression Repeat { } Gradient descent for Logistic Regression Repeat { } Slide credit: Andrew Ng
Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification
How about MAP? • Maximum conditional likelihood estimate (MCLE) • Maximum conditional a posterior estimate (MCAP)
Prior • Common choice of : • Normal distribution, zero mean, identity covariance • “Pushes” parameters towards zeros • Corresponds to Regularization • Helps avoid very large weights and overfitting Slide credit: Tom Mitchell
MLE vs. MAP • Maximum conditional likelihood estimate (MCLE) • Maximum conditional a posterior estimate (MCAP)
Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification
Multi-class classification • Email foldering/taggning: Work, Friends, Family, Hobby • Medical diagrams: Not ill, Cold, Flu • Weather: Sunny, Cloudy, Rain, Snow Slide credit: Andrew Ng
Binary classification Multiclass classification
One-vs-all (one-vs-rest) Class 1: Class 2: Class 3: Slide credit: Andrew Ng
One-vs-all • Train a logistic regression classifier for each class to predict the probability that • Given a new input , pick the class that maximizes Slide credit: Andrew Ng
Discriminative Approach Ex: Logistic regression Estimate directly (Or a discriminant function: e.g., SVM) Prediction Generative Approach Ex: Naïve Bayes Estimate and Prediction