1 / 23

Review of Lecture Two

Review of Lecture Two. Linear Regression Cost Function Gradient Decent Normal Equation (X T X) -1 Probabilistic Interpretation Maximum Likelihood Estimation vs. Linear Regression Gaussian Distribution of the Data Generative vs Discriminative.

fausto
Download Presentation

Review of Lecture Two

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review of Lecture Two • Linear Regression • Cost Function • Gradient Decent • Normal Equation • (XTX)-1 • Probabilistic Interpretation • Maximum Likelihood Estimation vs. Linear Regression • Gaussian Distribution of the Data • Generative vs Discriminative

  2. General Linear Regression MethodsImportant Implications • Recall q, a column vector (1 for the intercept q0+ n parameters), can be obtained from: • When the Xvariables are linearly independent (XTX being full rank), there is a unique solution to the normal equations; • The inversion of XTX depends on the existence of XTXX=X, that is to find a matrix equivalent of a numerical reciprocal; • Only models with a single output variable can be trained.

  3. Maximum Likelihood Estimation • Assume data are i.i.d. (independently identically distributed) • Likelihood of L(q) = the probability of y given x parameterized by q • What is Maximum Likelihood Estimation (MLE)? • Chose parameters qto maximize the function , so to make the training data set as probable as possible.

  4. The Connection Between MLE and OLE • Chose parameters q to maximize the data likelihood: • Equivalent to minimize

  5. The Equivalence of MLE and OLE = J(q) !?

  6. Today’s Content • Logistic Regression • Discrete Output • Connection to MLE • The Exponential Family • Bernoulli • Gaussian • Generalized Linear Models (GLMs)

  7. Sigmoid (Logistic) Function Other functions that smoothly increase from 0 to 1 can also be found, but for a couple of good reasons (we will see next time for the Generalize Linear Methods) that the choice of the logistic function is a natural one.

  8. Gradient Assent for MLE of the Logistic Function Recall Let’s working with just one training example (x, y), and to derive the Gradient Ascent rule: Given

  9. One Useful Property of the Logistic Function

  10. Identical to Least Square Again?

  11. Discriminative vs. Generative Algorithms • Discriminative Learning • Either Learn p(y|x) directly, or learn hq {1,0} that given x, the hypothesis will output {1,0} directly; • Logistic regression is an example of discriminative learning algorithm; • In Contrast, Generative Learning • Build the probabilistic distribution of x conditioned for each of the classes, p(x|y=1) and p(x|y=0), respectively; • Also build the probabilistic distribution of p(y=1) or p(y=0), as the class priors (or the weights); • Use the Bayes Rule to compare the p(x|y) given y=1 or y=0, i.e., to see which one is more likely;

  12. Question For P(y|x; q) • We learn qin order to maximize the P(y I x;q) • When we do so: • If y ~ Gaussian, we use Least Square Regression • If y {0,1} ~ Bernoulli, we use Logistic Regression Why ? Any natural reasons?

  13. Any Probabilistic, Linear, and General (PLG), Learning Framework? A web-site visiting problem, for a PLG solution

  14. Generalized Linear ModelsThe Exponential Family Natural (distribution) Parameter Sufficient Statistics, often T(y) = y Normalization Term A fixed choice of T, a, and b defines a set of distributions that is parameterized by h; as we vary h we will get different distributions within this family (affecting the mean). Bernoulli, Gaussian, and other distributions are examples of exponential family distributions. A way of unifying various statistical models, like linear regression, logistic regression and Poisson regression, into one framework.

  15. Examples of distributions in the exponential family • Gaussian • Bernoulli • Binomial • Multinomial • Chi-square • Exponential • Poisson • Beta • …

  16. Bernoulli Y | x; q ~ ExpFamily (h), here we chose a, b, T to be the specific form to cause the distribution to be Bernoulli. For any fixed x, q, we hope that our algorithm will output hq(x) = E[y|x;q) = p (y=1|x;q) = f = 1/(1+e-h) = 1/(1+eqTx) If you recall that the form of logistic function being 1/(1+e-z), now you should understand why we chose the logistic form for a learning process if my data mimics a Bernoulli distribution.

  17. To Build GLM • p = (y|x ; q) where y belongs to a distribution of the Exponential Family (h) given x and q • Given x, our goal is to output E[T(y)|x] • i.e., we want h(x) = E[T(y)|x] (Note for most cases, T(y) = y) • Think about the relationship between the input x and the parameter h, which we hope to use h to define my desired distribution, according to • h=qTX (linear, as my design choice), h is a number or a vector

  18. Generalized Linear ModelsThe Exponential Family

  19. More precisely… A flexible generalization of ordinary least squares regression that relates the random distribution of the distribution function to the systematic portion of the linear predictor through a function called the link function.

  20. Extensions The standard GLZ assumes that the observations are uncorrelated (i.i.d.) Models that deal with correlated data are extensions of GLZ’s. • Generalized estimating equations: Use population-averaged effects. • Generalized linear mixed models: A type of multilevel model (mixed model), an extension of logistic regression. • Hierarchical generalized linear models: similar to generalized linear mixed models, apart from two distinctions: • The random effects can have any distribution in the exponential family, whereas current linear mixed models nearly always have normal random effects; • Computationally less complex than linear mixed models.

  21. Summary • GLM is a flexible generalization of ordinary least squares regression. • GLM generalizes linear regression by allowing the linear model to be related to the output variable via a link function and by allowing the magnitude of the variance of each feature to be a function of its predicted value. • GLMs are of unifying various other statistical models, including linear, logistic, …, and Poisson regressions, under one framework. • This allowed us to develop a general algorithm for maximum likelihood estimation in all these models. • It extends naturally to encompass many other models as well. • In a GLM, the output is thus assumed to be generated from a particular distribution function of the exponential family.

More Related