1 / 55

Bayesian regularization of learning

Bayesian regularization of learning. Sergey Shumsky NeurOK Software LLC. Induction F.Bacon Machine. Deduction R.Descartes Math. modeling. Learning. Scientific methods. Models. Data. Outline. Learning as ill-posed problem General problem: data generalization

tammy
Download Presentation

Bayesian regularization of learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian regularization of learning Sergey Shumsky NeurOK Software LLC

  2. Induction F.Bacon Machine Deduction R.Descartes Math. modeling Learning Scientific methods Models Data

  3. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  4. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  5. Problem statement • Learning isinverse, ill-posed problem • Model Data • Learning paradoxes • Infinite predictions Finite data? • How to optimize future predictions? • How to select regular from casual in data? • Regularization of learning • Optimal model complexity

  6. Well-posed problem • Solution is unique • Solution is stable • Hadamard (1900-s) • Tikhonoff (1960-s)

  7. Learning from examples • Problem: • Find hypothesish, generating observed dataDin modelH • Well defined if not sensitive to: • noise in data (Hadamard) • learning procedure (Tikhonoff)

  8. Learning is ill-posed problem • Example: Function approximation • Sensitive tonoise in data • Sensitive tolearning procedure

  9. Learning is ill-posed problem • Solution is non-unique

  10. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  11. Problem regularization • Main idea: restrict solutions – sacrifice precision to stability How to choose?

  12. + + … + Statistical Learning practice • DataLearning set+ Validation set • Cross-validation: • Systematic approach to ensembles Bayes

  13. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  14. Statistical Learning theory • Learning as inverse Probability • Probability theory.H:hD • Learning theory.H:hD Bernoulli (1713) H Bayes (~ 1750)

  15. Bayesian learning Prior Posterior Evidence

  16. Coin tossing game H

  17. Monte Carlo simulations

  18. Bayesian regularization • Most Probable hypothesis  Learning error Regularization Example: Function approximation

  19. 0 1 11 10 110 111 Minimal Description Length Rissanen (1978) • Most Probable hypothesis hypothesis Code length for: Data Example: Optimal prefix code

  20. Data Complexity • ComplexityK(D |H) =min L(h,D|H) Kolmogoroff (1965) Code lengthL(h,D) = codeddata L(D|h)+ decodingprogram L(h) Decoding DataD

  21. Complex = Unpredictable Solomonoff (1978) • Prediction error ~ L(h,D)/L(D) • Random data is uncompressible • Compression = predictability Example: block coding Programh:lengthL(h,D) Decoding DataD

  22. L(h,D) UniversalPrior H D • All 2L programs with lengthL are equiprobable • Data complexity Solomonoff (1960) Bayes (~1750)

  23. Statistical ensemble • Shorter description length • Proof: • Corollary: Ensemble predictions are superior to most probable prediction

  24. Ensemble prediction

  25. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  26. Model comparison Posterior Evidence

  27. Statistics: Bayes vs. Fisher • Fisher: maxLikelihood • Bayes: maxEvidence

  28. Historical outlook • 20 – 60s of ХХ century • Parametric statistics • AsymptoticN • 60 - 80s of ХХ century • Non-Parametric statistics • Regularization of ill-posed problems • Non-asymptotic learning • Algorithmic complexity • Statistical physics of disordered systems Fisher (1912) Chentsoff (1962) Tikhonoff (1963) Vapnik (1968) Kolmogoroff (1965) Gardner (1988)

  29. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  30. Statistical physics • Probability of hypothesis - microstate • Optimal model - macrostate

  31. Free energy • F = - log Z: • Log ofSum  • F = E – TS: • Sum of logs • P = P{L}

  32. EM algorithm.Main idea • Introduce independent P: • Iterations • E-step: • М-step:

  33. EM algorithm • Е-step • Estimate Posterior for given Model • М-step • Update Model for given Posterior

  34. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  35. x y h P(x|H) y h(x) x Bayesian regularization: Examples • Hypothesis testing • Function approximation • Data clustering

  36. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  37. y h0 Hypothesis testing • Problem • Noisy observations:y • Is theoretical value h0true? • ModelH: Gaussian noise Gaussian prior

  38. Optimal model: Phase transition • Confidence •  finite •  infinite

  39. P(h) h y P(h) y Threshold effect • Student coefficient • Hypothesis h0 is true • Corrections to h0

  40. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  41. y h(x) x Function approximation • Problem • Noisy data:y(x) • Find approximation h(x) • Model: Noise Prior

  42. Optimal model • Free energy minimization

  43. Saddle point approximation • Function of best hypothesis

  44. ЕМ learning • Е-step. Optimal hypothesis • М-step. Optimal regularization

  45. LaplacePrior • Pruned weights • Equisensitive weights

  46. Outline • Learning as ill-posed problem • General problem: data generalization • General remedy: model regularization • Bayesian regularization. Theory • Hypothesis comparison • Model comparison • Free Energy & EM algorithm • Bayesian regularization. Practice • Hypothesis testing • Function approximation • Data clustering

  47. x P(x|H) Clustering • Problem • Noisy data:x • Find prototypes (mixture density approximation) • How many clusters? • Модель: Noise

  48. Optimal model • Free energy minimization • Iterations • E-step: • М-step:

  49. ЕМ algorithm • Е-step: • М-step:

  50. h(m) 1/ How many clusters? • Number of clusters M() • Optimal number of clusters

More Related