1 / 32

Machine Learning 10601 Recitation 6 Sep 30, 2009 Oznur Tastan

Machine Learning 10601 Recitation 6 Sep 30, 2009 Oznur Tastan. Outline. Multivariate Gaussians Logistic regression. Multivariate Gaussians (or "multinormal distribution“ or “multivariate normal distribution”). Univariate case: single mean  and variance . Multivariate case:

hamilton
Download Presentation

Machine Learning 10601 Recitation 6 Sep 30, 2009 Oznur Tastan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning 10601Recitation 6Sep 30, 2009Oznur Tastan

  2. Outline Multivariate Gaussians Logistic regression

  3. Multivariate Gaussians (or "multinormal distribution“ or “multivariate normal distribution”) Univariate case: single mean and variance  Multivariate case: Vector of observations x, vector of means  and covariance matrix  Dimension of x Determinant

  4. Multivariate Gaussians Univariate case Multivariate case do not depend on x normalization constants depends on x and positive

  5. The mean vector

  6. Covariance of two random variables Recall for two random variables xi, xj

  7. The covariance matrix transpose operator Var(xm)=Cov(xm, xm)

  8. An example: 2 variate case Covariance matrix The pdf of the multivariate will be: Determinant

  9. An example: 2 variate case Factorized into two independent Gaussians! They are independent! Recall in general case independence implies uncorrelation but uncorrelation does not necessarily implies independence. Multivariate Gaussians is a special case where uncorrelation implies independence as well.

  10. Diagonal covariance matrix If all the variables are independent from each other, The covariance matrix will be an diagonal one. Reverse is also true: If the covariance matrix is a diagonal one they are independent Diagonal matrix: m matrix where off-diagonal terms are zero

  11. Gaussian Intuitions: Size of  Identity matrix  = [0 0]  = [0 0]  = [0 0] • = I  = 0.6I  = 2I As  becomes larger, Gaussian becomes more spread out

  12. Gaussian Intuitions: Off-diagonal As the off-diagonal entries increase, more correlation between value of x and value of y

  13. Gaussian Intuitions: off-diagonal and diagonal • Decreasing non-diagonal entries (#1-2) • Increasing variance of one dimension in diagonal (#3)

  14. Isocontours

  15. Isocontours example We have showed Now let’s try to find for some constant c the isocontour

  16. Isocontours continued

  17. Isocontours continued Define Equation of an ellipse Centered on μ1, μ2 and axis lengths 2r1 and 2r2

  18. We had started with diaogonal matrix In the diagonal covariance matrix case the ellipses will be axis aligned.

  19. Don’t confuse Multivariate Gaussians with Mixtures of Gaussians Mixture of Gaussians: Component Mixing coefficient K=3

  20. Logistic regression Linear regression Outcome variable Y is continuous Logistic regression Outcome variable Y is binary

  21. Logistic function (Logit function) logit(z) z This term is [0, infinity] Notice σ(z) is always bounded between [0,1] (a nice property) and as z increase σ(z) approaches 1, as z decreases σ(z) approaches to 0

  22. Logistic regression Learn a function to map X values to Y given data Discrete Xcan be continuous or discrete The function we try to learn is P(Y|X)

  23. Logistic regression

  24. Classification If this holds Y=0 is more probable Than Y=1 given X

  25. Classification Take log both sides Classification rule if this holds Y=0

  26. Logistic regression is a linear classifier Y=0 Decision boundary Y=1

  27. Classification wo=+2, to check evaluate at X1=0 g(z)~0.1 σ(z)= σ(w0+w1X1)) X1 X1 σ(z) is 0.5 when X1=0 to see Notice σ(z) is 0.5 when X1=2 Classify as Y=0 Classify as Y=0

  28. Estimating the parameters Given data Objective: Train the model to get w that maximizes the conditional likelihood

  29. Difference with Naïve Bayes of Logistic Regression Loss function! Optimize different functions → Obtain different solutions Naïve Bayes argmaxP(X|Y) P(Y) Logistic Regression argmaxP(Y|X)

  30. Naïve Bayes and Logistic Regression Have a look at the Tom Mitchell’s book chapter http://www.cs.cmu.edu/%7Etom/mlbook/NBayesLogReg.pdf Linked under Sep 23 Lecture Readings as well.

  31. Some matlab tips for the last question in HW3 logical function might be useful for dividing into splits. An example of logical in use (please read the Matlab help) S=X(logical(X(:,1)==1),:) this will also work S=X((X(:1)==1,:)) This will subset the portion of the X matrix where the first column has value 1 and will put in matrix S (like Data>Filter in Excel) Matlab has functions for mean, std, sum,inv, log2 Scaling data to zero mean and unit variance: shifting the mean by the mean (subtracting the mean from every element of the vector) and scaling such that it has variance=1 ( dividing the every element of the vector by standard deviation) To be able to do that in matrices. You will need the repmat function, have a look at that otherwise the size of the matrices would not match..etc Elementwise multiplication use .*

  32. References http://www.stanford.edu/class/cs224s/lec/224s.09.lec10.pdf http://www.cs.cmu.edu/%7Etom/mlbook/NBayesLogReg.pdf Carlos Guestrin lecture notes Andrew Ng lecture notes

More Related