1 / 61

Linear Regression

Linear Regression. Task: Learning a real valued function f: x->y where x=<x 1 ,…,x n > as a linear function of the input features x i : Using x 0 =1, we can write as:. Linear Regression. 3. Cost funct ion. We want to penalize from deviation from the target values:

dagan
Download Presentation

Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Regression

  2. Task: Learning a real valued function f: x->y where x=<x1,…,xn> as a linear function of the input features xi: • Using x0=1, we can write as:

  3. Linear Regression 3

  4. Cost function We want to penalize from deviation from the target values: Cost function J(q) is a convex quadratic function, so no local minima. 4

  5. Linear Regression – Cost function 5

  6. Finding q that minimizes J(q) • Gradient descent: • Lets consider what happens for a single input pattern:

  7. Gradient Descent Stochastic Gradient Descent (update after each pattern) vs Batch Gradient Descent (below):

  8. Need for scaling input features 8

  9. Finding q that minimizes J(q) • Closed form solution: where X is the row vector of data points.:

  10. If we assume with e(i)being iid and normally distributed around zero. • we can see that the least-squares regression corresponds to finding the maximum likelihood estimate of θ:

  11. Underfitting: What if a line isn’t a good fit? • We can add more features => overfitting  Regularization

  12. Regularized Linear Regression 13

  13. Regularized Linear Regression 14

  14. Skipped • Locally weighted linear regression • You can read more in: http://cs229.stanford.edu/notes/cs229-notes1.pdf

  15. Logistic Regression

  16. Logistic Regression - Motivation • Letsnow focus on the binaryclassification problem in which • y can take on only two values, 0 and 1. • xis a vector of real-valued features, < x1… xn > • We could approach the classification problem ignoring the fact that y isdiscrete-valued, and use our old linear regression algorithm to try to predicty given x. • However, it is easy to construct examples where this methodperforms very poorly. • Intuitively, it also doesn’t make sense for h(x) to takevalues larger than 1 or smaller than 0 when we know that y ∈ {0, 1}.

  17. Logistic Function 18

  18. Derivative of the Logistic Function 20

  19. Interpretation: hq(x) : estimate of probability that y=1 for a given x hq(x) = P(y = 1 | x; θ) Thus: P(y = 1 | x; θ) = hq(x) P(y = 0 | x; θ) = 1 − hq(x) • Which can be written more compactly as: P(y | x; θ) = (h(x))y (1 − h(x))1−y 21

  20. 23

  21. 24

  22. Mean Squared Error – Not Convex 25

  23. Alternative cost function? 26

  24. New cost function • Make the cost function steeper: • Intuitively, saying that p(malignant|x)=0 and being wrong should be penalized severely! 27

  25. New cost function 28

  26. New cost function 29

  27. 30

  28. 31

  29. Minimizing the New Cost function Convex! 33

  30. Fitting q

  31. Fitting q Working with a single input and remembering h(x) = g(qTx):

  32. Skipped • Alternative: Maximizing l(q) using Newton’s method

  33. From http://www.cs.cmu.edu/~tom/10701_sp11/recitations/Recitation_3.pdf 38

  34. Regularized Logistic Regression 39

  35. Softmax Regression Multinomial Logistic Regression MaxEnt Classifier

  36. Softmax Regression • Softmax regression model generalizes logistic regression to classification problems where the class label ycan take on more than two possible values.  • The response variable y can take on any one of k values, so y ∈{1, 2, . . . , k}.

  37. kx(n+1) matrix

  38. 45

  39. Softmax Derivation from Logistic Regression 46

  40. One fairly simple way to arrive at the multinomial logit model is to imagine, for K possible outcomes, running K-1 independent binary logistic regression models, in which one outcome is chosen as a "pivot" and then the other K-1 outcomes are separately regressed against the pivot outcome. This would proceed as follows, if outcome K (the last outcome) is chosen as the pivot:

  41. 48

  42. 49

  43. Cost Function We now describe the cost function that we'll use for softmax regression. In the equation below, 1{.}  is the indicator function, so that 1{a true statement} = 1, and 1{a false statement} = 0. For example, 1{2 + 2 = 4} evaluates to 1; whereas  1{1 + 1 = 5} evaluates to 0.

  44. Remember that for logistic regression, we had: which can be written similarly as:

  45. The softmax cost function is similar, except that we now sum over the k different possible values of the class label. • Note also that in softmax regression, we have that  : logistic : softmax .

More Related