Defeating the Black Box – Neural Networks in HEP Data Analysis

Defeating the Black Box – Neural Networks in HEP Data Analysis Jan Therhaag (University of Bonn) TMVA Workshop @ CERN, January 21st, 2011 TMVA on the web: http://tmva.sourceforge.net/ TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

The Problem …

The single neuron as a classifier

A simple approach: • Code the classes as a binary variable (here: blue= 0,orange= 1) • Perform a linear fit to this discrete function • Define the decision boundary by • //###################################################################################### //TMVA code //###################################################################################### //create Factory TMVA::Factory *factory = new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”) factory->AddVariable(“x1”,”F”); • factory->AddVariable(“x2”,”F”); • //book linear discriminant classifier (LD) • factory->BookMethod(TMVA::Types::kLD,”LD”); • Factory->TrainAllMethods(); • Factory->TestAllMethods(); • Factory->EvaluateAllMethods();

Now consider the sigmoid transformation: • has values in [0,1] and can be interpreted as the probability p(orange | x) (then obviously p(blue| x) = 1- p(orange | x) = )

We have just invented the neuron! • is called the activity of the neuron, while is called the activation

The idea of neuron training – searching the weight space

The training proceeds via minimization of the error function • The neuron learns via gradient descent* • Examples may be learned one-by-one (online learning) or all at once (batch learning) • Overtraining may occur! *more sophisticated techniques may be used

Network training and regularization

The class of networks used for regression and classification tasks is called feedforward networks • Neurons are organized in layers • The output of a neuron in one layer becomes the input for the neurons in the next layer • //###################################################################################### //TMVA code //###################################################################################### //create Factory TMVA::Factory *factory = new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”) factory->AddVariable(“x1”,”F”); • factory->AddVariable(“x2”,”F”); • //book Multi Layer Perceptron(MLP) network and definde network architecture • factory->BookMethod(TMVA::Types::kMLP,”MLP”,”NeuronType=sigmoid:HiddenLayers=N+5,N”); • Factory->TrainAllMethods(); • Factory->TestAllMethods(); • Factory->EvaluateAllMethods();

training data • Feedforward networks are universal approximators • Any continuous function can be approximated with arbitratry precision • The complexity of the output function is determined by the number of hidden units and the characteristic magnitude of the weights

From neuron training to network training - backpropagation • In order to find the optimal set of weights w, we have to calculate the derivatives • Recall the single neuron: • It turns out that: with for output neurons and else While input information is always propagated forward, errors are propagated backwards!

Some issues in network training • The error function has several minima, the result of the minimization typically depends on the starting values of the weights • The scaling of the inputs has an effect on the final solution • //###################################################################################### //TMVA code //###################################################################################### //create Factory TMVA::Factory *factory = new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”) factory->AddVariable(“x1”,”F”); • factory->AddVariable(“x2”,”F”); • //book Multi Layer Perceptron(MLP) network with normalized input distributions • factory->BookMethod(TMVA::Types::kMLP,”MLP”,”RandomSeed=1:VarTransform=N”); • Factory->TrainAllMethods(); • Factory->TestAllMethods(); • Factory->EvaluateAllMethods(); • Overtraining • bad generalization and overconfident predictions NN with 10 hidden units

Regularization and early stopping • Early stopping: Stopping the training before the minimum of E(w) is reached • a validation data set is needed • convergence is monitored in TMVA • Weight decay: Penalize large weights explicitly • //###################################################################################### //TMVA code //###################################################################################### //create Factory TMVA::Factory *factory = new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”) factory->AddVariable(“x1”,”F”); • factory->AddVariable(“x2”,”F”); • //book Multi Layer Perceptron(MLP) network with regulariaztion • factory->BookMethod(TMVA::Types::kMLP,”MLP”,”NCycles=500:UseRegulator”); • Factory->TrainAllMethods(); • Factory->TestAllMethods(); • Factory->EvaluateAllMethods(); NN with 10 hidden units and λ=0.02

Network complexity vs. regularization • Unless prohibited by computing power, a large number of hidden units H is to be preferred • no ad hoc limitation of the model • In the limits of , network complexity is entirely determined by the typical size of the weights Output

Advanced Topics Network learning as inference and Bayesian neural networks

Network training as inference • Reminder: Given the network output , the error function is just minus the log likelihood of the training data D • Similarly, we can interpret the weight decay term as a log probability distribution for w • Obviously, there is a close connection between the regularized error function and the inference for the network parameters likelihood prior normalization

Predictions and confidence • Minimizing the error corresponds to finding the most probable value which is used to make predictions • Problem: Predictions for points in regions less populated by the training data may be to confident Can we do better?

Using the posterior to make predictions • Instead of using , we can also exploit the full information in the posterior

Using the posterior to make predictions • Instead of using , we can also exploit the full information in the posterior See Jiahang’s talk this afternoon for details of the Bayesian approach to NN in the TMVA framework!

A full Bayesian treatment • In a full Bayesian framework, the hyperparameter(s) λareestimatedfromthedatabymaximizingtheevidence • notestdatasetisneeded • neuralnetworktunesitself • relevanceofinput variables canbetested (automaticrelevancedetermination ARD) • Simultaneous optimization of parameters and hyperparameters is technically challenging • TMVA uses a clever approximation model complexity model complexity

Summary (1) * A neuron can be understood as an extension of a linear classifier* A neural net consists of layers of neurons, input information always propagates forward, errors propagate backwards* Feedforward networks are universal approximators* The model complexity is governed by the typical weight size, which can be controlled by weight decay or early stopping* In the Bayesian framework, error minimization corresponds to inference and regularization corresponds to the choice of a prior for the parameters* The Bayesian approach makes use to the full posterior and gives better predictive power* The amount of regularization can be learned from the data by maximizing the evidence

Summary (2) Current features of the TMVA MLP:* Support for regression, binary and multiclass classification (new in 4.1.0 !)* Efficient optional preprocessing (Gaussianization, normalization) of the input distributions * Optional regularization to prevent overtraining+ efficient approximation of the posterior distribution of the network weights + self adapting regulator + error estimationFuture development in TMVA:* Automatic relevance determination for input variables * Extended automatic model (network architecture) comparison Thank you!

References Figures taken from:David MacKay: “Information Theory, Inference and Learning Algorithms”Cambridge University Press 2003Christopher Bishop: “Pattern Recognition and Machine Learning”Springer 2006Hastie, Tibshirani, Friedman: “The Elements of Statistical Learning”, 2nd Ed.Springer 2009These books are also recommended for further reading on neural networks

Defeating the Black Box – Neural Networks in HEP Data Analysis

Defeating the Black Box – Neural Networks in HEP Data Analysis

Presentation Transcript

Neural Networks Artificial neural network (ANN) is a machine learning approach inspired by the way in which the brain p

Exploratory Data Analysis

Artificial Intelligence Chapter 20.5: Neural Networks

Data analysis

Difference between Structured Analysis and Object Oriented Analysis?

III. Recurrent Neural Networks

Black Pad Defect

Red-black trees

Modeling and Analysis of Computer Networks

Predictive modeling with social networks

Identification and Neural Networks

Mathematical Models of Short-Term Synaptic plasticity

Learning Bayesian Networks from Data

Feedforward Neural Networks. Classification and Approximation

Artificial Neural Network (ANN)

Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University

Convergence and stability in networks with spiking neurons

Artificial Neural Networks

Data collection and analysis

Machine Learning

Chapter 8 RMON - Remote Monitoring