450 likes | 471 Views
Explore the dimensions of neural networks through the lens of Kolmogorov theory and Cascade Correlation. Learn how neural networks can tackle complex problems unsolvable by traditional methods. Gain insight on neuron count, layer depth, and more, and delve into the adaptive and fundamental analyses. Discover the potential of single-layer networks and the limitations they overcome. Dive into learning mechanisms like gradient descent and error functions. Understand Kolmogorov's theorem and its implications for neural network construction. Witness the universal approximation capabilities of networks with hidden layers and the importance of mapping and function definition. Uncover the dynamic nature of Cascade Correlation networks and their fast learning algorithms. Gather real-world examples and results showcasing the network's efficiency and scalability. See how Cascade Correlation enables fast learning, deep network construction, and boundary approximation. Embrace the versatility and power of dynamic networks in solving complex tasks.
E N D
Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost
Outline • Motivation • Neural Networks Power • Kolmogorov Theory • Cascade Correlation
Motivation • Consider you are an engineer and you know ANN • You encounter a problem that can not be solved with common analytical approaches • You decide to use ANN
But… • Some questions • Is this problem solvable using ANN? • How many neurons? • How many layers? • …
Two Approaches • Fundamental Analyses • Kolmogrov Theory • Adaptive Networks • Cascade Correlation
Outline • Motivation • Neural Networks Power • Kolmogorov Theory • Cascade Correlation
Single layer Networks • Limitations of the perceptron and linear classifiers
Network Construction (x,y)→(x^2, y^2,x*y) 1 2
Learning Mechanism • Using Error Function • Gradient Descent
Outline • Motivation • Neural Networks Power • Kolmogorov Theory • Cascade Correlation
Kolmogorov theorem (concept) • An example: • Any continuous function of n dimensions can be completely characterized by a dimensional continuous functions
g r y x An Idea • Suppose we want to construct f (x, y) • A simple idea: find a mapping • (x, y) → r • Then define a function g such that: • g(r) = f(x, y)
An Example • Suppose we have a discrete function: • We choose a mapping • We define the 1-dimentional function • So
Kolmogrov theorem • In the illustrated example we had:
Universal Approximation • Neural Networks with a hidden layer can approximate any continuous function with arbitrary precision • Use independent function from main function • approximate the network with traditional networks
A kolmogorov Network • We have to define: • Mapping • Function g
Spline Function • Linear combination of several 3-dimensional functions • Used to approximate functions with given points
Mapping y x
X2=4.5 X1=2.5 Example 2.1 1.6 x1 2.5 3.2 2.5 1.4 x2 4.5
X2=4.5 X1=2.5 Function g • Now for each unique input value of a we should define a output value g corresponding to f • We choose the value of f in the center of the square
Reduce Error • Shifting defined patterns • N different patterns will be generated • Use avg y
Replace the function • With sufficiently large number of knots:
Outline • Motivation • Neural Networks Power • Kolmogorov Theory • Cascade Correlation
Cascade Correlation • Dynamic size, depth, topology • Single layer learning in spite of multilayer structure • Fast learning
Correlation • Residual error for output unit for pattern p • Average residual error for output unit • Computed activation for input vector x(p) • Z(p) • Average activation, over all patterns, of candidate unit
Correlation • Use Variance as a similarity criteria • Update weights similar to gradient descent
An Example • 100 Run • 1700 epochs on avg • Beats standard backprob with factor 10 with the same complexity
Results • Cascade Correlation is either better • Only forward pass • Many of epochs are run while the network is very small • Cashing mechanism
Another Example • N-input parity problem • Standard backprob takes 2000 epoches on N=8 with 16 hidden neurons
Discussion • There is no need to guess the size, depth and the connectivity pattern • Learns fast • Can build deep networks (high order feature detector) • Herd effect • Results can be cashed
Conclusion • A Network with a hidden layer can define complex boundaries and can approximate any function • The number of neurons in the hidden layer determines the amount of approximation • Dynamic Networks