230 likes | 238 Views
Learn about probability, random variables, probability distribution, compressing PDF, maximum likelihood, Bayesian inference, and characteristics of distributions.
E N D
A significant portion of the slides are copies from Prof. Andrew Moore’s tutorials on Statistical Data Mining: http://www.autonlab.org/tutorials/ 10-701/15-781 Machine Learning(Recitation 1) By Fan Guo 9/14/06
We start with… • Probability —— Degree of Uncertainty • Random Variable • Probability Distribution • Discrete (probability mass function) • Continuous (cdf, pdf) • Remember, they should be normalized!
We are going to cover… • Compress the information from the pdf • Maximum Likelihood (ML) • Bayesian inference • Gaussian and Gaussian magic
Characterizing a distribution • Mean (or Expectation) • Variance • Standard Deviation • Mode • Min, Max • Entropy • the Plot!
Discussion • Prove that E[X] is the value u that minimize E[(X-u)2] • Answer: write E[(X-u)2] explicitly as a function of u, take derivatives with respect to u and set it to zero.
Discussion • What is the value u that minimizes E[|X-u|]? • Answer: The median. For continuous distribution, Let f be the cdf, it is f-1(0.5).
In 2 dimensions X – Mile per Gallon Y – Car weight
In 2 dimensions X – Mile per Gallon Y – Car weight • Expectation (Centroid) • Entropy • Marginal Dists. • Conditional Dists. • Covariance • Independence
Test your understanding When, if ever, Var[X+Y] = Var[X] + Var[Y]? • All the time?
Test your understanding When, if ever, Var[X+Y] = Var[X] + Var[Y]? • All the time? • Only when X and Y are independent?
Test your understanding When, if ever, Var[X+Y] = Var[X] + Var[Y]? • All the time? • Only when X and Y are independent? • It can fail even if X and Y are independent?
This slide is copied from Jimeng Sun’s recitation slides for the fall 2005 class.
What if? • X = {T, T, T, T, T} • L(p) = ? • pML = ?
Being Bayesian • We are uncertain about p… • We treat p as a random variable • We have a prior belief: p ~ Uniform(0,1)
Comments on Bayesian Inference • We are uncertain about p • It is represented by a prior belief on p • We observe a data set X • We try to update our belief using the Bayes rule • The posterior distribution may be useful for future experiments/inference • Sometimes it is not easy to compute the posterior distribution because we have to take the integration to compute p(X). • If we use conjugate prior, the problem becomes easy. • The choice of the prior depends on the background knowledge, the model and the computational cost desired • Now let’s see how to estimate p after we compute the posterior distribution
The Posterior Belief • MAP • easier to compute • PosteriorMean • MAP may notbe desired for a skewed distribution
What we covered… • Collapse the pdf • Joint distribution could tell everything… • Likelihood, log-likelihood • ML estimation vs Bayesian inference • MAP, Posterior Mean • un-informative prior, conjugate prior
What we didn’t cover… • Many interesting and useful pdf • Conditional independence • Gaussian • http://www.autonlab.org/tutorials/ • MLE and Bayesian Inference for continuous distribution