270 likes | 298 Views
Investigating the role of individual neurons as outlier detectors. Carlos López-Vázquez Laboratorio LatinGEO SGM+Universidad ORT del Uruguay September 15 th , 2015. carloslopez@uni.ort.edu.uy. Agenda. Motivation for Outlier detection stage ANN as a regression tool Formulation of the rule
E N D
Investigating the role of individual neurons as outlier detectors Carlos López-Vázquez Laboratorio LatinGEO SGM+Universidad ORT del Uruguay September 15th, 2015 carloslopez@uni.ort.edu.uy
Agenda • Motivation for Outlier detection stage • ANN as a regression tool • Formulation of the rule • Case 1 of application: small dataset • Case 2 of application: large dataset
Why to worry with outliers? • Outliers are unusual (in some sense) events • Might adversely affect further calculations OR • Might be the most valuable result! • Usually ANN produces an output given an input • Always! • What about the consequences? • We might want to detect spurious inputs
Example #1: Medical From Lucila Ohno-Machado, 2004 • Given some inputs, detect/classify a possible coronary disease
Myocardial Infarction Network Intensity Duration Elevation Pain Smoker ECG: ST Pain Age Male Answer: just a number y=“Probability” of MI 0.8 No room for I DON'T KNOW!
Example #2: Autonomous Land Vehicle • NN learns to steer an autonomous vehicle. • 960 input units, 4 hidden units, 30 output units • Driving at speeds up to 70 miles per hour ALVINN System Image of a forward - mounted camera Weight values for one of the hidden units
Goal • Identify unlikely coming events • And thus (maybe) refuse to estimate outputs! • Supplement ANN answer (numerical, categorical) with some credibility flag • How? • Showing unlikely events during training (supervised) • Relying on already trained ANN (unsupervised)
Multi Layer Perceptron (MLP) y=18.4*v1-22.1*v2+10.2*v3 y=10.4*v1+5.12*v2+8.9*v3 y=20.2*v1+0.18*v2-9.1*v3 X1 adjustable weights 18.4 -22.1 10.2 10.4 5.12 8.9 20.2 0.18 -9.1 X2 v1 X3 v2 y v3 X4 X5
Why weights are so different? • Conjecture: • It might denote a specific role for the neuron • Such role can be connected to outliers • Wow! Which one are candidates? • Large weights? Small weights? • Preliminary analysis suggested that Large WeightsOutlier detectors But... convince me!
Two different problems • 1) Does the rule indeed works? If so: • 2) How it performs when compared with other outlier detection procedures?
Example #3: Iris Flower Classification • 3 species of Iris – SETOSA, VERSICOLOR, VIRGINICA • Each flower has parts called sepal & petal • Length and Width of sepal & petal can be used to determine iris type • Data collected on large number of iris flowers • For example, in one flower petal length=6.7mm and width=4.3mm also sepal length=22.4mm & sepal width =62.4mm. Iris type was SETOSA • An ANN can be trained to determine specie of iris for given set of petal and sepal width and length
sepal length sepal width petal length petal width Classification using regression Somewhat unusual paper: used regression with a single output instead of the common three binary outputs! Unusual paper: internal weights of the ANN were published! v1 v2 y v3 From Benítez et al., 1997
sepal length sepal width petal length petal width Classification using regression From Benítez et al., 1997 v1 v2 y v3 The ANN can be simplified...
Pruned ANN and the classification is still good despite not exact sepal length sepal width petal length petal width Which role had the other two?
Modified version z=“credibility flag” y=2.143v3 All misclassifications now announced by z=1!
Example #4: daily rain dataset • Weather records typically have missing values • Many applications require complete databases • Well established linear methods for interpolate spatial observations exist • Their performance is poor for daily rain records • Why not ANN?
Data and test area description • 30 years of daily records for 10 stations were available • 30 % of the events have missing values • More than 80% of the readings are of zero rain, evenly distributed in the year • Annual averages ranges from 1600 to 500 mm/day; time correlation is low
Non-linear interpolants: ANN • We used ANN as interpolators, with 9 inputs and 1 output • The training was performed with one third of the dataset using backpropagation and minimizing the RMSE • Some different architectures were considered (one and two hidden layers; different number of neurons, etc.) as well as some transformations of the data
Skipping other details… • We applied our rule to each of the 10 ANN • Run a Monte Carlo experiment, seeding known outliers at random and locating them afterwards • Thorough comparison against state-of-the-art alternatives (details in the paper) • The ANN-based outlier detection tool performed very well • Best, when outlier size (Mozilla effect) was ignored • Satisfactory otherwise
Pros… • Training stage is as usual; no special routine is required • We inspect the internal weights; no need to retraining • Unsupervised classifications: outliers are not declared as such in advance • Might offer an objective criteria to suspect underfitting
Cons… • Weights might be sensible to outliers (masking effect) which in turn might prevent to detect them • Which outliers are located? Only some suitable ones?
Questions? Carlos López-Vázquez Laboratorio LatinGEO SGM+Universidad ORT del Uruguay September 15th, 2015 carloslopez@uni.ort.edu.uy