300 likes | 737 Views
Neural Networks. Marcel Jiřina. Introduction. Neural networks and their use to classification and other tasks ICS AS CR Theoretical computer science Neural networks , genetic alg. and n onlinear methods Numeric algorithms .. 1 mil. eq. Fuzzy sets, approximate reasoning, possibility th.
E N D
Neural Networks Marcel Jiřina Institute of Computer Science, Prague
Introduction • Neural networks and their use to classification and other tasks • ICS AS CR • Theoretical computer science • Neural networks, genetic alg. and nonlinear methods • Numeric algorithms ..1 mil. eq. • Fuzzy sets, approximate reasoning, possibility th. • Applications: Nuclear science, Ecology, Meteorology, Reliability in machinery, Medical informatics … Institute of Computer Science, Prague
Structure of talk • NN classification • Some theory • Interesting paradigms • NN and statistics • NN and optimization and genetic algorithms • About application of NN • Conlusions Institute of Computer Science, Prague
Approximators Associative memories General Predictors Auto-associative Hetero-associative Classifiers Teacher MLP-BP RBF GMDH NNSU Marks Klán Hopfield Perceptron(*) Hamming No teacher Kohonen CarpentierGrossberg (SOM) NE Kohonen (NE) Signals Continuous, real-valued Binary, multi-valued (continuous) NN classification NE – not existing. Associated response can be arbitrary and then must be given - by teacher Feed-forward, recurrent Fixed structure - growing Institute of Computer Science, Prague
Some theory Kolmogorov theorem Kůrková – Theorem Sigmoid transfer function Institute of Computer Science, Prague
MLP - BP Three layer - Single hidden layer MLP – 4 layer – 2 hidden Other paradigms have its own theory – another Institute of Computer Science, Prague
Interesting paradigms Paradigm – general notion on structure, functions and algorithms of NN • MLP - BP • RBF • GMDH • NNSU All: approximators Approximator + thresholding = Classifier Institute of Computer Science, Prague
MLP - BP MLP – error Back Propagation coefficients , (0,1) - Lavenberg-Marquart - Optimization tools MLP with jump transfer function - Optimization Feed – forward (in recall) Matlab, NeuralWorks, … Good when default is sufficient or when network is well tuned: Layers, neurons, , Institute of Computer Science, Prague
RBF • Structure same as in MLP • Bell-shaped transfer function (Gauss) • Number and positions of centers: random – cluster analysis • “broadness” of that bell • Size of individual bells • Learning methods • Theory similar to MLP • Matlab, NeuralWorks, … Good when default is sufficient or when network is well tuned : Layers mostly one hidden, # neurons, transfer function, proper cluster analysis (fixed No. of clusters, variable? Near – Far metric or criteria) Institute of Computer Science, Prague
GMDH 1 (…5) Group Method Data Handling • Group – initially a pair of signals only • “per partes” or successive polynomial approximator • Growing network • “parameterless” – parameter-barren • No. of new neurons in each layer only (processing time) • (output limits, stopping rule parameters) • Overtraining – learning set is split to • Adjusting set • Evaluation set GMDH 2-5: neuron, growing network, learning strategy, variants Institute of Computer Science, Prague
GMDH 2 – neuron • Two inputs x1, x2 only • True inputs • Outputs from neurons of the preceding layer • Full second order polynomial y = a x12 + b x1 x2 + c x22 + d x1 + e x2 + f y = neuron’s output • n inputs => n(n-1)/2 neurons in the first layer • Number of neurons grows exponentially • Order of resulting polynomial grows exponentially: 2, 4, 8, 16, 32, … • Ivakhnenko polynomials … some elements are missing Institute of Computer Science, Prague
GMDH 3 – learning a neuron • Matrix of data: inputs and desired value u1, u2 , u3, …, un,y sample 1 u1, u2 , u3, …, un,y sample 1 …. sample m • A pair of two u’s are neuron’s inputs x1, x2 • m approximating equations, one for each sample a x12 + b x1 x2 + c x22 + d x1 + e x2 + f = y • Matrix X = Y= (a, b, c, d, e, f)t • Each row of X is x12+x1x2+x22+x1+x2+1 • LMS solution = (XtX)-1XtY • If XtX is singular, we omit this neuron Institute of Computer Science, Prague
GMDH 4 - growing network x1, x2 y = desired output Institute of Computer Science, Prague
GMDH 5 learn. strategy Problem: Number of neurons grows exponentially NN=n(n-1)2 • Let the first layer of neurons grow unlimited • In next rows: • [learning set split to adjusting set and evaluating set] • Compute parameters a,…f using adjusting set • Evaluate error using evaluating set and sort • Select some n best neurons and delete the others • Build the next layer OR • Stop learning if stopping condition is met. Institute of Computer Science, Prague
GMDH 6 learn. Strategy 2 Select some n best neurons and delete the others Control parameter of GMDH network Institute of Computer Science, Prague
GMDH 7 - variants • Basic – full quadratic polynomial – Ivakh. poly • Cubic, Fourth order simplified … • Reach higher order in less layers and less params • Different stopping rules • Different ratio of sizes of adjusting set and evaluating set Institute of Computer Science, Prague
NNSU GA Neural Network with Switching Units learned by the use of Genetic Algorithm • Approximator by lot of local hyper-planes; today also by local more general hyper-surfaces • Feed-forward network • Originally derived from MLP for optical implementation • Structure looks like columns above individual inputs • More … František Institute of Computer Science, Prague
Learning and testing set • Learning set • Adjusting (tuning) set • Evaluation set • Testing set One data set – the splitting influences results • Fair evaluation problem Institute of Computer Science, Prague
NN and statistics • MLP-BP mean squared error minimization • Sum of errors squared … MSE criterion • Hamming distance for (pure) classifiers • No other statistical criteria or tests are in NN: • NN transforms data, generates mapping • statistical criteria or tests are outside NN (2, K-S, C-vM,…) Is NN good for K-S test? … is y=sin(x) good for 2 test? • Bayes classifiers, k-th nearest neighbor, kernel methods … Institute of Computer Science, Prague
NN and optimization and genetic algorithms Learning is an optimization procedure • Specific to given NN • General optimization systems or methods • Whole NN • Parts – GMDH and NNSU - linear regression • Genetic algorithm • Not only parameters, the structure, too • May be faster than iterations Institute of Computer Science, Prague
About application of NN • Soft problems • Nonlinear • Lot of noise • Problematic variables • Mutual dependence of variables • Application areas • Economy • Pattern recognition • Robotics • Particle physics • … Institute of Computer Science, Prague
Strategy when using NN • For “soft problems” only • NOT for • Exact function generation • periodic signals etc. • First subtract all “systematics” • Nearly noise remains • Approximate this nearly noise • Add back all systematics • Understand your paradigm • Tune it patiently or • Use “parameterless” paradigm Institute of Computer Science, Prague
Conlusions • Powerfull tool • Good when well used • Simple paradigm, complex behavior • Special tool • Approximator • Classifier • Universal tool • Very different problems • Soft problems Institute of Computer Science, Prague