Bayesian Learning VC Dimension

Bayesian Learning VC Dimension Jahwan Kim 2000. 5. 24 AIPR Lab. CSD., KAIST

Contents • Bayesian learning • General idea, & an example • Parametric vs. nonparametric statistical inference • Model capacity and generalizability • Further readings Jahwan Kim, Dept. of CS, KAIST

Bayesian learning • Conclude from hypothesis constructed from the given data. • Predictions are made from the hypotheses, weighted by their posterior probabilities. Jahwan Kim, Dept. of CS, KAIST

Bayesian learningFormulation • X is the prediction, H’s the hypotheses, and D the give data. • Requires calculation of P(H|D) for all H’s, and this is intractable in many cases. Jahwan Kim, Dept. of CS, KAIST

Bayesian learningMaximum a posteriori hypothesis • Take H that maximizes the a posteriori probability P(H|D). • How do we find such H? Use Bayes’ rule: Jahwan Kim, Dept. of CS, KAIST

Bayes learningcontinued • P(D) remains fixed for all H. • P(D|H) is the likelihood the given data is observed given H. • P(H), the prior probability, has been the source of debate. • If too biased, we get underfitting. • Sometimes a uniform prior is appropriate. In that case, we choose the maximum likelihood hypothesis. Jahwan Kim, Dept. of CS, KAIST

Bayesian learningParameter estimation • Problem: Find p(x|D) when • We know the form of pdf, i.e., the pdf is parametrized by , written as p(x|). • A priori pdf p() is known. • Data D is given. • We only have to find p(|D), since then we may use Jahwan Kim, Dept. of CS, KAIST

Parameter estimation, continued • By Bayes’ rule, • Assume also each sample in D is drawn independently with identical pdf, i.e., it is i.i.d. Then • This gives the formal solution to the problem Jahwan Kim, Dept. of CS, KAIST

Parameter estimationExample • One-dimensional normal distribution • Two parameters,  and . • Assume that p() is normal with known mean m and variance s. • Assume also that  is also known. • Then Jahwan Kim, Dept. of CS, KAIST

Example, continued •  squared term appears in the exponent of the expression (or compute it) • Namely, p(|D) is also normal. • Its variance and mean are given by where is the sample mean. Jahwan Kim, Dept. of CS, KAIST

Estimation of mean • As n goes to infinity , p(|D) approaches the Dirac delta function centered at the sample mean. Jahwan Kim, Dept. of CS, KAIST

Two main approaches of (statistical) inference • Parametric inference • Investigator should know the problem well. • The model contains finite number of unknown parameters. • Nonparametric inference • No reliable a priori info about the problem. • Number of samples required is too large. Jahwan Kim, Dept. of CS, KAIST

Capacity of models • Well known fact: • If a model is too complicated, it doesn’t generalize well; • if too simple, it doesn’t represent well. • How do we measure model capacity? • In classical statistics, by the number of parameter, or degree of freedom • In the (new) statistical learning theory, by VC dim. Jahwan Kim, Dept. of CS, KAIST

VC dimension • Vapnik-Chervonenkis dimension is a measure of capacity of a model. Jahwan Kim, Dept. of CS, KAIST

VC dimensionExamples • It’s not always equal to the number of parameters: • A line of the form {ax+by+c} in 2D plane has VC dimension 3, but • One parameter family {sgn(sin ax)} (in one dimension) has VC dimension infinity! Jahwan Kim, Dept. of CS, KAIST

Theorem from STL onVC dimension and generalizability Jahwan Kim, Dept. of CS, KAIST

Further readings • Vapnik, Statistical Learning Theory, Ch. 0 & sections 1.1-1.3 • Haykin, Neural Networks, sections 2.13-2.14 • Duda & Hart, Pattern Classification and Scene Analysis, sections 3.3-3.5 Jahwan Kim, Dept. of CS, KAIST

Bayesian Learning VC Dimension

Bayesian Learning VC Dimension

Presentation Transcript

PAC Learning and The VC Dimension

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning and Learning Bayesian Networks

Bayesian Learning

Bayesian Learning

Bayesian Learning

VC Dimension – definition and impossibility result

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning

Computational Learning Theory PAC IID VC Dimension SVM

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian learning

Bayesian Learning

VC Dimension – definition and impossibility result