10-701/15-781 Machine Learning (Recitation 1)

A significant portion of the slides are copies from Prof. Andrew Moore’s tutorials on Statistical Data Mining: http://www.autonlab.org/tutorials/ 10-701/15-781 Machine Learning(Recitation 1) By Fan Guo 9/14/06

We start with… • Probability —— Degree of Uncertainty • Random Variable • Probability Distribution • Discrete (probability mass function) • Continuous (cdf, pdf) • Remember, they should be normalized!

We are going to cover… • Compress the information from the pdf • Maximum Likelihood (ML) • Bayesian inference • Gaussian and Gaussian magic

Characterizing a distribution • Mean (or Expectation) • Variance • Standard Deviation • Mode • Min, Max • Entropy • the Plot!

Discussion • Prove that E[X] is the value u that minimize E[(X-u)2] • Answer: write E[(X-u)2] explicitly as a function of u, take derivatives with respect to u and set it to zero.

Discussion • What is the value u that minimizes E[|X-u|]? • Answer: The median. For continuous distribution, Let f be the cdf, it is f-1(0.5).

In 2 dimensions X – Mile per Gallon Y – Car weight

In 2 dimensions X – Mile per Gallon Y – Car weight • Expectation (Centroid) • Entropy • Marginal Dists. • Conditional Dists. • Covariance • Independence

Test your understanding When, if ever, Var[X+Y] = Var[X] + Var[Y]? • All the time?

Test your understanding When, if ever, Var[X+Y] = Var[X] + Var[Y]? • All the time? • Only when X and Y are independent?

Test your understanding When, if ever, Var[X+Y] = Var[X] + Var[Y]? • All the time? • Only when X and Y are independent? • It can fail even if X and Y are independent?

This slide is copied from Jimeng Sun’s recitation slides for the fall 2005 class.

What if? • X = {T, T, T, T, T} • L(p) = ? • pML = ?

Being Bayesian • We are uncertain about p… • We treat p as a random variable • We have a prior belief: p ~ Uniform(0,1)

Computing the Posterior

Comments on Bayesian Inference • We are uncertain about p • It is represented by a prior belief on p • We observe a data set X • We try to update our belief using the Bayes rule • The posterior distribution may be useful for future experiments/inference • Sometimes it is not easy to compute the posterior distribution because we have to take the integration to compute p(X). • If we use conjugate prior, the problem becomes easy. • The choice of the prior depends on the background knowledge, the model and the computational cost desired • Now let’s see how to estimate p after we compute the posterior distribution

The Posterior Belief • MAP • easier to compute • PosteriorMean • MAP may notbe desired for a skewed distribution

What we covered… • Collapse the pdf • Joint distribution could tell everything… • Likelihood, log-likelihood • ML estimation vs Bayesian inference • MAP, Posterior Mean • un-informative prior, conjugate prior

What we didn’t cover… • Many interesting and useful pdf • Conditional independence • Gaussian • http://www.autonlab.org/tutorials/ • MLE and Bayesian Inference for continuous distribution

Thank you!

10-701/15-781 Machine Learning (Recitation 1)