1 / 28

Minimal Neural Networks

Minimal Neural Networks. Support vector machines and Bayesian learning for neural networks. Peter Andras andrasp@ieee.org. Bayesian neural networks I. The Bayes rule:.

Download Presentation

Minimal Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

  2. Bayesian neural networks I. The Bayes rule: Let’s consider a model of a system and an observation of the system, an event. The a posteriori probability of correctness of the model, after the observation of the event, is proportional to the product of the a priori correctness of the model and the probability of the event conditioned by the correctness of the model. Mathematically: where  is the parameter of the model H and D is the observed event

  3. Bayesian neural networks II. Best model: model with highest a posteriori probability of correctness Model selection by optimizing the formula:

  4. Bayesian neural networks III. Application to neural networks: g is the function represented by the neural network, where  is the vector of all parameters of the network is the observed event we suppose normal distribution for the data conditioned by the validity of a model, i.e., the observed values yi are normally distributed around g(xi), if  is the correct parameter vector

  5. Bayesian neural networks IV. By making the calculations we get: and the new formula for optimization is:

  6. Bayesian neural networks V. The equivalence of the regularization and Bayesian model selection Regularization formula: Bayesian optimization formula: Equivalence: Both represents a priori information about the correct solution

  7. Bayesian neural networks VI. Bayesian pruning by regularization Gauss pruning: Laplace pruning: Cauchy pruning: N is the number of components of the  vectors

  8. Support vector machines - SVM I. Linear separable classes: - many separators - there is an optimal separator

  9. Support vector machines - SVM II. How to find the optimal separator ? - support vectors - overspecification Property: one less support vector new optimal separator

  10. Support vector machines - SVM III. We look for minimal and robust separators. These are minimal and robust models of the data. The full data set is equivalent with the set of the support vectors with respect to the specification of the minimal robust model.

  11. Support vector machines - SVM IV. Mathematical problem formulation I. we represent the separator as a pair (w,b), where w is vector and b is a scalar we look w and b such that they satisfy: The support vectors are those xi-s for which this inequality is in fact equality.

  12. Support vector machines - SVM V. Mathematical problem formulation II. The distances form the origo of the hyper-planes of the support vectors are: The distance between the two planes is:

  13. Support vector machines - SVM VI. Mathematical problem formulation III. Optimal separator: the distance between the two hyper-planes is maximal Optimization: with the restrictions that or in other form

  14. Support vector machines - SVM VII. Mathematical problem formulation IV. Complete optimization formula, using Lagrange multipliers

  15. Support vector machines - SVM VIII. Mathematical problem formulation V. Writing the optimality conditions for w and b we get: The dual problem is: The support vectors are those xi-s for which i is strictly positive

  16. Support vector machines - SVM IX. Graphical interpretation We search for the tangent point of a hyper-ellipsoid with the positive space quadrant

  17. Support vector machines - SVM X. How to solve the support vector problem ? Optimization with respect to the -s - gradient method - Newton and quasi-Newton methods We get as result: - the support vectors - the optimal linear separator

  18. Support vector machines - SVM XI. Implications for artificial neural networks: - robust perceptron (low sensitivity to noise) - minimal linear classificatory neural network

  19. Support vector machines - SVM XII. What can we do if the boundary is nonlinear ? Idea: transform the data vectors to a space where the separator is linear

  20. Support vector machines - SVM XIII. The transformation many times is made to an infinite dimensional space, usually a function space. Example: x  cos(uTx)

  21. Support vector machines - SVM XIV. The new optimization formulas are:

  22. Support vector machines - SVM XIV. How to handle the products of the transformed vectors ? Idea: use a transformation that fits the Mercer theorem Let then K has a decomposition Mercer theorem: where and H is a function space if and only if for each

  23. Support vector machines - SVM XV. Optimization formula with transformation that fits the Mercer theorem: The form of the solution: the b is determined from an equation valid for a support vector

  24. Support vector machines - SVM XVI. Examples of transformations and kernels a. b. c.

  25. Support vector machines - SVM XVII. Other typical kernels

  26. Support vector machines - SVM XVIII. Summary of main ideas • look for minimal complexity classification • transform the data to another space where the class boundaries are linear • use Mercer kernels

  27. Support vector machines - SVM XIX. Practical issues • the global optimization doesn’t work with large amount of data  sequential optimization with chunks of the data • the resulted models are minimal complexity models, they are insensitive to noise and keep the generalization ability of the more complex models • applications: character recognition, economic forecasting

  28. Regularization neural networks General optimization vs. optimization over the grid The regularization operator specifies the grid: - we look for functions that satisfy ||Tg||2=0 - in the relaxed case the regularization operator is incorporated as a constraint in the error function:

More Related