1 / 17

Bayesian Learning VC Dimension

Bayesian Learning VC Dimension . Jahwan Kim 2000. 5. 24 AIPR Lab. CSD., KAIST. Contents. Bayesian learning General idea, & an example Parametric vs. nonparametric statistical inference Model capacity and generalizability Further readings. Bayesian learning.

everett
Download Presentation

Bayesian Learning VC Dimension

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Learning VC Dimension Jahwan Kim 2000. 5. 24 AIPR Lab. CSD., KAIST

  2. Contents • Bayesian learning • General idea, & an example • Parametric vs. nonparametric statistical inference • Model capacity and generalizability • Further readings Jahwan Kim, Dept. of CS, KAIST

  3. Bayesian learning • Conclude from hypothesis constructed from the given data. • Predictions are made from the hypotheses, weighted by their posterior probabilities. Jahwan Kim, Dept. of CS, KAIST

  4. Bayesian learningFormulation • X is the prediction, H’s the hypotheses, and D the give data. • Requires calculation of P(H|D) for all H’s, and this is intractable in many cases. Jahwan Kim, Dept. of CS, KAIST

  5. Bayesian learningMaximum a posteriori hypothesis • Take H that maximizes the a posteriori probability P(H|D). • How do we find such H? Use Bayes’ rule: Jahwan Kim, Dept. of CS, KAIST

  6. Bayes learningcontinued • P(D) remains fixed for all H. • P(D|H) is the likelihood the given data is observed given H. • P(H), the prior probability, has been the source of debate. • If too biased, we get underfitting. • Sometimes a uniform prior is appropriate. In that case, we choose the maximum likelihood hypothesis. Jahwan Kim, Dept. of CS, KAIST

  7. Bayesian learningParameter estimation • Problem: Find p(x|D) when • We know the form of pdf, i.e., the pdf is parametrized by , written as p(x|). • A priori pdf p() is known. • Data D is given. • We only have to find p(|D), since then we may use Jahwan Kim, Dept. of CS, KAIST

  8. Parameter estimation, continued • By Bayes’ rule, • Assume also each sample in D is drawn independently with identical pdf, i.e., it is i.i.d. Then • This gives the formal solution to the problem Jahwan Kim, Dept. of CS, KAIST

  9. Parameter estimationExample • One-dimensional normal distribution • Two parameters,  and . • Assume that p() is normal with known mean m and variance s. • Assume also that  is also known. • Then Jahwan Kim, Dept. of CS, KAIST

  10. Example, continued •  squared term appears in the exponent of the expression (or compute it) • Namely, p(|D) is also normal. • Its variance and mean are given by where is the sample mean. Jahwan Kim, Dept. of CS, KAIST

  11. Estimation of mean • As n goes to infinity , p(|D) approaches the Dirac delta function centered at the sample mean. Jahwan Kim, Dept. of CS, KAIST

  12. Two main approaches of (statistical) inference • Parametric inference • Investigator should know the problem well. • The model contains finite number of unknown parameters. • Nonparametric inference • No reliable a priori info about the problem. • Number of samples required is too large. Jahwan Kim, Dept. of CS, KAIST

  13. Capacity of models • Well known fact: • If a model is too complicated, it doesn’t generalize well; • if too simple, it doesn’t represent well. • How do we measure model capacity? • In classical statistics, by the number of parameter, or degree of freedom • In the (new) statistical learning theory, by VC dim. Jahwan Kim, Dept. of CS, KAIST

  14. VC dimension • Vapnik-Chervonenkis dimension is a measure of capacity of a model. Jahwan Kim, Dept. of CS, KAIST

  15. VC dimensionExamples • It’s not always equal to the number of parameters: • A line of the form {ax+by+c} in 2D plane has VC dimension 3, but • One parameter family {sgn(sin ax)} (in one dimension) has VC dimension infinity! Jahwan Kim, Dept. of CS, KAIST

  16. Theorem from STL onVC dimension and generalizability Jahwan Kim, Dept. of CS, KAIST

  17. Further readings • Vapnik, Statistical Learning Theory, Ch. 0 & sections 1.1-1.3 • Haykin, Neural Networks, sections 2.13-2.14 • Duda & Hart, Pattern Classification and Scene Analysis, sections 3.3-3.5 Jahwan Kim, Dept. of CS, KAIST

More Related