Constrain, Train, Validate and Explain: A Classifier for Mission-Critical Applications

Constrain, Train, Validate and Explain: A Classifier for Mission-Critical Applications Yaki Engel & Amir Navot

The Importance of Being Specific

“Mission-Critical” ? Classification error  Challenges: • Hard to test the classifier in real scenarios • Training data is biased But, • We have a good understanding of the features Mission failed!

Sea Spotter - Naval IRST Helicopters Guided missiles Attack aircraft Sea-skimming cruise missiles Marine vessels Small attack crafts

Main Threat - Sea Skimming Missiles • Missile rises 25-30km from ship • Time of flight ~ 30-60s • Trajectory < 1mrad above horizon • Faint “stationary” point target 3mrad

Detect and Track Algorithm Low Level (LL) HighLevel (HL) Video Input Track Generation Detection Machine Learning Tracks Vs. Time Classification Block (CB) Start Report Track Classification (Grade) Report Decision Continue Report Stop Report Target “Pool” 6

Conventional (offline) ML Workflow Collect lots of data Label it Choose and train a classifier Validate it (e.g., by cross-validation) Deploy classifier

So What’s Wrong with That? The training data is biased, and the bias is unknown. We cannot “trust the data and hope for the best”. We know a lot about our features – why not use that? Validation must go beyond cross-validation – held-out data is just as biased as the training data!

Classifier Desiderata P(Y|X) Transparent Interpretable Malleable (prior knowledge) Discriminative Probabilistic Efficiently learnable (“Big Data”) Handles missing values

Factored Logistic Regression • Learn a logistic regression model for each feature separately: • Model each component as a linear combination of basis functions:

Naïve Bayes FLR (NBLR) • Logistic regression pros: • Linear (in the parameters), convex problem. • Efficient algorithms for ML/MAP/Bayes solutions. • Discriminative, probabilistic output: P(Y|X). Combine these models by applying a monotonic function of the individual model log-odds. E.g., Classify by thresholding the log-odds.

Control by Constraints • Titanic sank slowly • "Women and children first” policy was applied • It is not bad to travel in higher class… • “Ceteris Paribus” • Example: classify Titanic passengers into those who survived and those who didn’t • Features: Age (0/1), Gender(0/1), Class (1-4) • Prior knowledge:

Control by Constraints monotonicity monotonicity monotonicity Age Gender Class

Control by Constraints

Veto Analysis Positive Class Negative Class The existence of too small veto sets may indicate lack of robustness.

Veto Analysis – Example Veto sets of size 1 for the Titanic data-set:

Explanations For an instance classified as positive, an explanationis a minimal subset of “in favor” features, required to tip the scale in favor of the positive class (against all “against” features) • The ability to explain classifier decisions • Is useful in investigating failures • Is a strong validation tool • What’s an explanation?

Single Instance Explanation

Explanation for Groups Useful for validation purposes An especially interesting group: the set of misclassified instances.

Our ML Workflow Collect data. Label it. Choose, constrain and train a classifier. Validate: Visualization, veto analysis, explanationsand cross-validation. Deploy classifier. Explain its decisions.

Thank you.

Missing Values • Common problem: We can’t always measure all features • There are two separate problems: • Missing values in the training set • Missing values when using the classifier • Common solution: Imputation – substitute the missing value by some representative value. Easy to implement for any classifier and usually works. Can we do better?

Missing Values - Cont. Ideally, when we need to classify an instance with missing values, we would like to use a classifier that was trained exactly on the valid(not missing) features The problem is that for a problem with features we have to train different classifiers For most classifiers this is impractical, but for FLR it’s easy

Missing Values – Cont. • In the training phase of FLR, we train classifiers (models), one for each feature. • For the classifier of each feature we use all instances without missing value in this feature • In the classification phase, we sum only the scores of models of validfeatures, and we use the appropriate. reminder:

Experiments Empirical support that constraints may be used to confer FLR classifiers resistance to the effects of sample-selection bias datasets were chosen due to the clear semantics of their features, allowing a layman to specify sensible constraints We compare FLR with logistic regression, which is known to be resistant to a certain form of bias

Experiments – Cont. • Four forms of bias: • “insufficient coverage” – small training sets • “truncation bias” – remove instances with value of a feature above a threshold • “one class truncation bias” – truncation bias applied only on instances of one class • “one class truncation bias” + prior modification

Experiments – Cont. * 1, x, x2, log(x − ℓ + 1), log(u−x+1), log(x−ℓ+1)2 and . log(u−x+1)2 where u and l are upper and lower bounds. • Algorithms: • Three flavors of FLR, all with quad-log-quad* set of basis functions: • FLR(free) – No constraints, No model selection • FLR(con) – with constraints, No model selection • FLRα(con) – with constraints and model selection • Two flavors of Logistic Regression(LR): • LRx – linear LR • LRqlq – LR with quad-log-quad basis

Experiments – Cont. • Data-sets: • Titanic – predict whether a passenger survived according to its Age, Gender and traveling Class. • Boston housing – predict whether a house value more than $25,000, based on different properties of the area. • Adult – predict whether a person’s per-annum income exceeds $50,000, based on census data.

Experiments – Cont. • Constraints were placed using common sense. Some example: • Titanic – lower age, being a female, and traveling in a better class would each increase a person’s chances of survival. • Boston housing – e.g., we expect that high level of crime reduce the chance of high value. • Adult – e.g. on “employed” and “education” we set an “increasing” constraint, while on “age” and “hoursPerWeek” a concavity constraint.

Results – small training set Titanic

Results – small training set Boston Housing

Results – small training set Adult

Results – small training set

Results – Truncation Bias (both classes, by “Age”)

Results – Truncation Bias (Negative class only)

Results – Truncation Bias + Prior Modification

Wrap-up • FLR classifiers pack all of the capabilities mentioned above into a single classification tool. • FLR supports a workflow that we find to be useful in mission-critical applications: • Constrain your classifier using prior knowledge, • train it, • thoroughly validate it, • and be ready to explain and visualize it.

Wrap-up • The end result is a classifier that is consistent both with • training data, and • the designer’s prior knowledge on the physical meaning of the classifier’s inputs.

Example of Mission-Critical Application Naval Infrared Search & Track (IRST) System

Constrain, Train, Validate and Explain: A Classifier for Mission-Critical Applications