1 / 23

10-701/15-781 Machine Learning (Recitation 1)

Learn about probability, random variables, probability distribution, compressing PDF, maximum likelihood, Bayesian inference, and characteristics of distributions.

mtheresa
Download Presentation

10-701/15-781 Machine Learning (Recitation 1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A significant portion of the slides are copies from Prof. Andrew Moore’s tutorials on Statistical Data Mining: http://www.autonlab.org/tutorials/ 10-701/15-781 Machine Learning(Recitation 1) By Fan Guo 9/14/06

  2. We start with… • Probability —— Degree of Uncertainty • Random Variable • Probability Distribution • Discrete (probability mass function) • Continuous (cdf, pdf) • Remember, they should be normalized!

  3. We are going to cover… • Compress the information from the pdf • Maximum Likelihood (ML) • Bayesian inference • Gaussian and Gaussian magic

  4. Characterizing a distribution • Mean (or Expectation) • Variance • Standard Deviation • Mode • Min, Max • Entropy • the Plot!

  5. Discussion • Prove that E[X] is the value u that minimize E[(X-u)2] • Answer: write E[(X-u)2] explicitly as a function of u, take derivatives with respect to u and set it to zero.

  6. Discussion • What is the value u that minimizes E[|X-u|]? • Answer: The median. For continuous distribution, Let f be the cdf, it is f-1(0.5).

  7. In 2 dimensions X – Mile per Gallon Y – Car weight

  8. In 2 dimensions X – Mile per Gallon Y – Car weight • Expectation (Centroid) • Entropy • Marginal Dists. • Conditional Dists. • Covariance • Independence

  9. Test your understanding When, if ever, Var[X+Y] = Var[X] + Var[Y]? • All the time?

  10. Test your understanding When, if ever, Var[X+Y] = Var[X] + Var[Y]? • All the time? • Only when X and Y are independent?

  11. Test your understanding When, if ever, Var[X+Y] = Var[X] + Var[Y]? • All the time? • Only when X and Y are independent? • It can fail even if X and Y are independent?

  12. This slide is copied from Jimeng Sun’s recitation slides for the fall 2005 class.

  13. What if? • X = {T, T, T, T, T} • L(p) = ? • pML = ?

  14. Being Bayesian • We are uncertain about p… • We treat p as a random variable • We have a prior belief: p ~ Uniform(0,1)

  15. Computing the Posterior

  16. Comments on Bayesian Inference • We are uncertain about p • It is represented by a prior belief on p • We observe a data set X • We try to update our belief using the Bayes rule • The posterior distribution may be useful for future experiments/inference • Sometimes it is not easy to compute the posterior distribution because we have to take the integration to compute p(X). • If we use conjugate prior, the problem becomes easy. • The choice of the prior depends on the background knowledge, the model and the computational cost desired • Now let’s see how to estimate p after we compute the posterior distribution

  17. The Posterior Belief • MAP • easier to compute • PosteriorMean • MAP may notbe desired for a skewed distribution

  18. What we covered… • Collapse the pdf • Joint distribution could tell everything… • Likelihood, log-likelihood • ML estimation vs Bayesian inference • MAP, Posterior Mean • un-informative prior, conjugate prior

  19. What we didn’t cover… • Many interesting and useful pdf • Conditional independence • Gaussian • http://www.autonlab.org/tutorials/ • MLE and Bayesian Inference for continuous distribution

  20. Thank you!

More Related