MML Inference of RBFs

MML Inference of RBFs Enes Makalic Lloyd Allison Andrew Paplinski

Presentation Outline • RBF architecture selection • Existing methods • Overview of MML • MML87 • MML inference of RBFs • MML estimators for RBF parameters • Results • Conclusion • Future work

RBF Architecture Selection (1) • Determine optimal network architecture for a given problem • Involves choosing: • Number and type of basis functions • Influences the success of the training process • If we choose a RBF that is: • Too small: poor performance • Too large: overfitting

RBF Architecture Selection (2) Overfitting Poor Performance

RBF Architecture Selection (2) • Architecture selection solutions • Use as many basis functions as there is data • Expectation Maximization (EM) • K-means clustering • Regression trees (M. Orr) • BIC, GPE, etc. • Bayesian inference • Reversible jump MCMC

Overview of MML (1) • Objective function to estimate the goodness of a model • A sender wishes to send data, x, to a receiver • How well is the data encoded? • Message length (for example, in bits) Transmissionchannel Sender Receiver (noiseless)

Hypothesis Data given Hypothesis - log Pr(H) - log Pr(D|H) Overview of MML (2) • Transmit the data in two parts: • Part 1: encoding of the model • Part 2: encoding of the data given the model • Quantitative form of Occam’s razor

Overview of MML (3) • MML87 • Efficient approximation to strict MML • Total message length for a model with parameters :

Overview of MML (4) • MML87 • is the prior information • is the likelihood function • is the number of parameters • is a dimension constant • is the determinant of the expected Fisher information matrix with entries (i, j):

Overview of MML (5) • MML87 • Fisher Information: • Sensitivity of likelihood function to parameters • Determines the accuracy of stating the model • Small second derivatives state parameters less precisely • Large second derivatives state parameters more accurately • A model that minimises the total message length is optimal

MML Inference of RBFs (1) • Regression problems • We require: • A likelihood function • Fisher information • Priors on all model parameters

MML Inference of RBFs (2) • Notation

MML Inference of RBFs (3) • RBF Network • m inputs, n parameters, o outputs • Mapping from parameters to outputs • w: vector of network parameters • Network output implicitly depends on the network input vector, • Define output non-linearity

MML Inference of RBFs (4) • Likelihood function • Learning: minimisation of a scalar function • We define L as the negative log likelihood • L implicitly depends on given targets, z, for network outputs • Different input-target pairs are considered independent

MML Inference of RBFs (5) • Likelihood function • Regression problems • The network error, , is assumed Gaussian with a mean and variance

MML Inference of RBFs (6) • Fisher information • Expected Hessian matrix, • Jacobian matrix of L • Hessian matrix of L

MML Inference of RBFs (7) • Fisher information • Taking expectations and simplifying we obtain • Positive semi-definite • Complete Fisher includes a summation over the whole data set D • We used an approximation to F • Block-diagonal • Hidden basis functions assumed to be independent • Simplified determinant – product of determinants for each block

MML Inference of RBFs (8) • Priors • Must specify a prior density for each parameter • Centres: uniform • Radii: uniform (log-scale) • Weights: Gaussian • Zero mean and standard deviation • is usually taken to be large (vague prior)

MML Inference of RBFs (9) • Message length of a RBF • where: • denotes the cost of transmitting the number of basis functions • F(w) is the determinant of the expected Fisher information • L is the negative log-likelihood • C is a dimension constant • Independent of w

MML Inference of RBFs (10) • MML estimators for parameters • Standard unbiased estimator for the error s.d. • Numerical optimisation using • Differentiation of the expected Fisher information determinant

Results (1) • MML inference criterion is compared to: • Conventional MATLAB RBF implementation • M. Orr’s regression tree method • Functions used for criteria evaluation • Correct answer known • Correct answer not known

Results (2) • Correct answer known • Generate data from a known RBF (one, three and five basis functions respectively) • Inputs uniformly sampled in the range (-8,8) • 1D and 2D inputs were considered • Gaussian noise N(0,0.1) added to the network outputs • Training set and test set comprise 100 and 1000 patterns respectively

Results (3) • MSE • Correct answer known (1D input)

Results (4) • MSE • Correct answer known (2D inputs)

Results (5) • Correct answer not known • The following functions were used:

Results (6) • Correct answer not known • Gaussian noise N(0,0.1) added to the network outputs • Training set and test set comprise 100 and 1000 patterns respectively

Results (7)

Results (8)

Results (9) • MSE • Correct answer not known

Results (10) • Sensitivity of criteria to noise

Results (11) • Sensitivity of criteria to data set size

Conclusion (1) • Novel approach to architecture selection in RBF networks • MML87 • Block-diagonal Fisher information matrix approximation • MATLAB code available from: • http://www.csse.monash.edu.au/~enesm

Conclusion (2) • Results • Initial testing • Good performance when level of noise and dataset size is varied • No over-fitting • Future work • Further testing • Examine if MML parameter estimators improve performance • MML and regularization

Conclusion (3) • Questions?

Conclusion (4) Thank you :)

MML Inference of RBFs

MML Inference of RBFs

Presentation Transcript

Rules of Inference

Inference

Geometrization of Inference

Inference

Methods of Inference

INFERENCE

Rules of Inference

Rules of Inference

YSFRI-MML Collaborations on Stock Enhancement

Process of Inference

MML

Review of Inference

Examples of Inference

MML Capital Conference

MML

MML

Inference

INFERENCE