Transfer functions: hidden possibilities for better neural networks.

Transfer functions: hidden possibilities for better neural networks. Włodzisław Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk

Why is this an important issue? MLPs are universal approximators - no need for other TF? Wrong bias => poor results, complex networks. Example of a 2-class problems: Class 1 inside the sphere, Class 2 outside. MLP: at least N +1 hyperplanes, O(N2) parameters. RBF: 1 Gaussian, O(N) parameters. Class 1 in the corner defined by (1,1 ... 1) hyperplane, C2 outside. MLP: 1 hyperplane, O(N) parameters. RBF: many Gaussians, O(N2) parameters, poor approximation.

Inspirations Logical rule: IF x1>0 & x2>0 THEN Class1 Else Class2 is not properly represented neither by MLP nor RBF! Result: decision trees and logical rules perform on some datasets (cf. hypothyroid) significantly better than MLPs! Speed of learning and network complexity depends on TF. Fast learning requires flexible „brain modules” - TF. • Biological inspirations: sigmoidal neurons are crude approximation at the basic level of neural tissue. • Interesting brain functions are done by interacting minicolumns, implementing complex functions. • Modular networks: networks of networks. • First step beyond single neurons: transfer functions providing flexible decision borders.

Transfer functions Transfer function f(I(X)): vector activation I(X)and scalar output o(I). 1. Fan-in, scalar product activation W.X, hyperplanes. 2. Distance functions as activations, for example Gaussian functions: 3. Mixed activation functions

Taxonomy - activation f.

Taxonomy - output f.

Taxonomy - TF

TF in Neural Networks Choices: • Homogenous NN: select best TF, try several typesEx: RBF networks; SVM kernels (today 50=>80% change). • Heterogenous NN: one network, several types of TF Ex: Adaptive Subspace SOM (Kohonen 1995), linear subspaces.Projections on a space of basis functions. • Input enhancement: adding fi(X) to achieve separability. Ex: functional link networks (Pao 1989), tensor products of inputs; D-MLP model. Heterogenous: 1. Start from large network with different TF, use regularization to prune 2. Construct network adding nodes selected from a pool of candidates 3. Use very flexible TF, force them to specialize.

Most flexible TFs Conical functions: mixed activations Lorentzian: mixed activations Bicentral - separable functions

Bicentral + rotations 6N parameters, most general. Box in N-1 dim x rotated window. Rotation matrix with band structure makes 2x2 rotations.

Some properties of TFs For logistic functions: Renormalization of a Gaussian gives logistic function where: Wi=4Di /bi2

Example of input transformation Minkovsky’s distance function: Sigmoidal activation changed to: Adding a single input renormalizing the vector:

Conclusions Radial and sigmoidal functions are not the only choice. StatLog report: large differences of RBF and MLP on many datasets. Better learning cannot repair wrong bias of the model. Systematic investigation and taxonomy of TF is worthwhile. Networks should select/optimize their functions. Open questions: Optimal balancebetweencomplex nodes/interactions (weights)? How to train heterogeneous networks? How to optimize nodes in a constructive algorithms? Hierarchical, modular networks: nodes that are networks themselves.

The End ? Perhaps the beginning ...

Transfer functions: hidden possibilities for better neural networks.

Transfer functions: hidden possibilities for better neural networks.

Presentation Transcript

Neural Networks

Neural Networks

Neural Networks

Neural Networks

NEURAL NETWORKS

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural networks

Neural Networks for Optimization

Inductive Transfer With Context-sensitive Neural Networks

Neural Networks

Neural Networks

Neural Networks

Optimizing number of hidden neurons in neural networks

Neural Networks

CSC321: Neural Networks Lecture 16: Hidden Markov Models

Neural Networks