140 likes | 241 Views
Transfer functions: hidden possibilities for better neural networks. W ł odzis ł aw Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk. Why is this an important issue?.
E N D
Transfer functions: hidden possibilities for better neural networks. Włodzisław Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk
Why is this an important issue? MLPs are universal approximators - no need for other TF? Wrong bias => poor results, complex networks. Example of a 2-class problems: Class 1 inside the sphere, Class 2 outside. MLP: at least N +1 hyperplanes, O(N2) parameters. RBF: 1 Gaussian, O(N) parameters. Class 1 in the corner defined by (1,1 ... 1) hyperplane, C2 outside. MLP: 1 hyperplane, O(N) parameters. RBF: many Gaussians, O(N2) parameters, poor approximation.
Inspirations Logical rule: IF x1>0 & x2>0 THEN Class1 Else Class2 is not properly represented neither by MLP nor RBF! Result: decision trees and logical rules perform on some datasets (cf. hypothyroid) significantly better than MLPs! Speed of learning and network complexity depends on TF. Fast learning requires flexible „brain modules” - TF. • Biological inspirations: sigmoidal neurons are crude approximation at the basic level of neural tissue. • Interesting brain functions are done by interacting minicolumns, implementing complex functions. • Modular networks: networks of networks. • First step beyond single neurons: transfer functions providing flexible decision borders.
Transfer functions Transfer function f(I(X)): vector activation I(X)and scalar output o(I). 1. Fan-in, scalar product activation W.X, hyperplanes. 2. Distance functions as activations, for example Gaussian functions: 3. Mixed activation functions
TF in Neural Networks Choices: • Homogenous NN: select best TF, try several typesEx: RBF networks; SVM kernels (today 50=>80% change). • Heterogenous NN: one network, several types of TF Ex: Adaptive Subspace SOM (Kohonen 1995), linear subspaces.Projections on a space of basis functions. • Input enhancement: adding fi(X) to achieve separability. Ex: functional link networks (Pao 1989), tensor products of inputs; D-MLP model. Heterogenous: 1. Start from large network with different TF, use regularization to prune 2. Construct network adding nodes selected from a pool of candidates 3. Use very flexible TF, force them to specialize.
Most flexible TFs Conical functions: mixed activations Lorentzian: mixed activations Bicentral - separable functions
Bicentral + rotations 6N parameters, most general. Box in N-1 dim x rotated window. Rotation matrix with band structure makes 2x2 rotations.
Some properties of TFs For logistic functions: Renormalization of a Gaussian gives logistic function where: Wi=4Di /bi2
Example of input transformation Minkovsky’s distance function: Sigmoidal activation changed to: Adding a single input renormalizing the vector:
Conclusions Radial and sigmoidal functions are not the only choice. StatLog report: large differences of RBF and MLP on many datasets. Better learning cannot repair wrong bias of the model. Systematic investigation and taxonomy of TF is worthwhile. Networks should select/optimize their functions. Open questions: Optimal balancebetweencomplex nodes/interactions (weights)? How to train heterogeneous networks? How to optimize nodes in a constructive algorithms? Hierarchical, modular networks: nodes that are networks themselves.
The End ? Perhaps the beginning ...