Jeremy Bolton Paul Gader CSI Laboratory University of Florida

Conjuntive Formulation of the Random Set Framework for Multiple Instance Learning:Application to Remote Sensing Jeremy Bolton Paul Gader CSI Laboratory University of Florida

Highlights • Conjunctive forms of Random Sets for Multiple Instance Learning: • Random Sets can be used to solve MIL problem when multiple concepts are present • Previously Developed Formulations assume Disjunctive relationship between concepts learned • New formulation provides for a conjunctive relationship between concepts and its utility is exhibited on a Ground Penetrating Radar (GPR) data set

Outline • Multiple Instance Learning • MI Problem • RSF-MIL • Multiple Target Concepts • Experimental Results • GPR Experiments • Future Work

Multiple Instance Learning

Standard Learning vs. Multiple Instance Learning • Standard supervised learning • Optimize some model (or learn a target concept) given training samples and corresponding labels • MIL • Learn a target concept given multiplesets of samples and corresponding labels for the sets. • Interpretation: Learning with uncertain labels / noisy teacher

Multiple Instance Learning (MIL) • Given: • Set of I bags • Labeled + or - • The ith bag is a set of Ji samples in some feature space • Interpretation of labels • Goal: learn concept • What characteristic is common to the positive bags that is not observed in the negative bags

Multiple Instance Learning Traditional Classification Multiple Instance Learning {x1, x2, x3, x4} label = 1 {x1, x2, x3, x4} label = 1 {x1, x2, x3, x4} label = 0 x1 label = 1 x2label = 1 x3label = 0 x4label = 0 x5label = 1

EHD: Feature Vector MIL Application: Example GPR • Collaboration: Frigui, Collins, Torrione • Construction of bags • Collect 15 EHD feature vectors from the 15 depth bins • Mine images = + bags • FA images = - bags

Standard vs. MI Learning: GPR Example • Standard Learning • Each training sample (feature vector) must have a label • Arduous task • many feature vectors per image and multiple images • difficult to label given GPR echoes, ground truthing errors, etc … • label of each vector may not be known EHD: Feature Vector

EHD: Feature Vector Standard vs MI Learning: GPR Example • Multiple Instance Learning • Each training bag must have a label • No need to label all feature vectors, just identify images (bags) where targets are present • Implicitly accounts for class label uncertainty …

Random Set Framework for Multiple Instance Learning

Random Set Brief • Random Set

It is NOT the case that EACH element is NOT the target concept How can we use Random Sets for MIL? • Random set for MIL: Bags are sets (multi-sets) • Idea of finding commonality of positive bags inherent in random set formulation • Sets have an empty intersection or non-empty intersection relationship • Find commonality using intersection operator • Random sets governing functional is based on intersection operator • Capacity functional : T A.K.A. : Noisy-OR gate (Pearl 1988)

Random Set Functionals • Capacity functionals for intersection calculation • Use germ and grain model to model random set • Multiple (J) Concepts • Calculate probability of intersection given X and germ and grain pairs: • Grains are governed by random radii with assumed cumulative: Random Set model parameters Germ Grain

x T x T T x x x x T T x x x RSF-MIL: Germ and Grain Model • Positive Bags = blue • Negative Bags = orange • Distinct shapes = distinct bags

Multiple Instance Learning with Multiple Concepts

Multiple Concepts: Disjunction or Conjunction? • Disjunction • When you have multiple types of concepts • When each instance can indicate the presence of a target • Conjunction • When you have a target type that is composed of multiple (necessary concepts) • When each instance can indicate a concept, but not necessary the composite target type

Conjunctive RSF-MIL • Previously Developed Disjunctive RSF-MIL (RSF-MIL-d) • Conjunctive RSF-MIL (RSF-MIL-c) Noisy-OR combination across concepts and samples Standard noisy-OR for one concept j Noisy-AND combination across concepts

Synthetic Data Experiments • Extreme Conjunct data set requires that a target bag exhibits two distinct concepts rather than one or none AUC (AUC when initialized near solution)

Application to Remote Sensing

Disjunctive Target Concepts • Using Large overlapping bins (GROSS Extraction) the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists Target Concept Type 1 NoisyOR Target Concept Type 2 NoisyOR OR … Target Concept Type n NoisyOR Target Concept Present?

What if we want features with finer granularity • Fine Extraction • More detail about image and more shape information, but may loose disjunctive nature between (multiple) instances Constituent Concept 1 (top of hyperbola) NoisyOR Target Concept Present? AND … Constituent Concept 2 (wings of hyperbola) NoisyOR Our features have more granularity, therefore our concepts may be constituents of a target, rather than encapsulating the target concept

GPR Experiments • Extensive GPR Data set • ~800 targets • ~ 5,000 non-targets • Experimental Design • Run RSF-MIL-d (disjunctive) and RSF-MIL-c (conjunctive) • Compare both feature extraction methods • Gross extraction: large enough to encompass target concept • Fine extraction: Non-overlapping bins • Hypothesis • RSF-MIL will perform well when using gross extraction whereas RSF-MIL-c will perform well using Fine extraction

Experimental Results • Highlights • RSF-MIL-d using gross extraction performed best • RSF-MIL-c performed better than RSF-MIL-d when using fine extraction • Other influencing factors: optimization methods for RSF-MIL-d and RSF-MIL-c are not the same Gross Extraction Fine Extraction

Future Work • Implement a general form that can learn disjunction or conjunction relationship from the data • Implement a general form that can learn the number of concepts • Incorporate spatial information • Develop an improved optimization scheme for RSF-MIL-C

Backup Slides

MIL Example (AHI Imagery) • Robust learning tool • MIL tools can learn target signature with limited or incomplete ground truth Which spectral signature(s) should we use to train a target model or classifier? Spectral mixing Background signal Ground truth not exact

MI-RVM • Addition of set observations and inference using noisy-OR to an RVM model • Prior on the weight w

SVM review • Classifier structure • Optimization

MI-SVM Discussion • RVM was altered to fit MIL problem by changing the form of the target variable’s posterior to model a noisy-OR gate. • SVM can be altered to fit the MIL problem by changing how the margin is calculated • Boost the margin between the bag (rather than samples) and decision surface • Look for the MI separating linear discriminant • There is at least one sample from each bag in the half space

mi-SVM • Enforce MI scenario using extra constraints At least one sample in each positive bag must have a label of 1. Mixed integer program: Must find optimal hyperplane and optimal labeling set All samples in each negative bag must have a label of -1.

Current Applications • Multiple Instance Learning • MI Problem • MI Applications • Multiple Instance Learning: Kernel Machines • MI-RVM • MI-SVM • Current Applications • GPR imagery • HSI imagery

HSI: Target Spectra Learning • Given labeled areas of interest: learn target signature • Given test areas of interest: classify set of samples

Overview of MI-RVM Optimization • Two step optimization • Estimate optimal w, given posterior of w • There is no closed form solution for the parameters of the posterior, so a gradient update method is used • Iterate until convergence. Then proceed to step 2. • Update parameter on prior of w • The distribution on the target variable has no specific parameters. • Until system convergence, continue at step 1.

1) Optimization of w • Optimize posterior (Bayes’ Rule) of w • Update weights using Newton-Raphsonmethod

2) Optimization of Prior • Optimization of covariance of prior • Making a large number of assumptions, diagonal elements of A can be estimated

Random Sets: Multiple Instance Learning • Random set framework for multiple instance learning • Bags are sets • Idea of finding commonality of positive bags inherent in random set formulation • Find commonality using intersection operator • Random sets governing functional is based on intersection operator

MI issues • MIL approaches • Some approaches are biased to believe only one sample in each bag caused the target concept • Some approaches can only label bags • It is not clear whether anything is gained over supervised approaches

x T x T T x x x x T T x x x RSF-MIL • MIL-like • Positive Bags = blue • Negative Bags = orange • Distinct shapes = distinct bags

Side Note: Bayesian Networks • Noisy-OR Assumption • Bayesian Network representation of Noisy-OR • Polytree: singly connected DAG

Side Note • Full Bayesian network may be intractable • Occurrence of causal factors are rare (sparse co-occurrence) • So assume polytree • So assume result has boolean relationship with causal factors • Absorb I, X and A into one node, governed by randomness of I • These assumptions greatly simplify inference calculation • Calculate Z based on probabilities rather than constructing a distribution using X

Diverse Density (DD) • Probabilistic Approach • Goal: • Standard statistics approaches identify areas in a feature space with high density of target samples and low density of non-target samples • DD: identify areas in a feature space with a high “density” of samples from EACH of the postitive bags (“diverse”), and low density of samples from negative bags. • Identify attributes or characteristics similar to positive bags, dissimilar with negative bags • Assume t is a target characterization • Goal: • Assuming the bags are conditionally independent

It is NOT the case that EACH element is NOT the target concept Diverse Density • Calculation (Noisy-OR Model): • Optimization

Random Set Brief • Random Set

Random Set Functionals • Capacity and avoidance functionals • Given a germ and grain model • Assumed random radii

When disjunction makes sense • Using Large overlapping bins the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists Target Concept Present OR

Theoretical and Developmental Progress • Previous Optimization: • Did not necessarily promote diverse density • Current optimization • Better for context learning and MIL • Previously no feature relevance or selection (hypersphere) • Improvement: included learned weights on each feature dimension • Previous TO DO list • Improve Existing Code • Develop joint optimization for context learning and MIL • Apply MIL approaches (broad scale) • Learn similarities between feature sets of mines • Aid in training existing algos: find “best” EHD features for training / testing • Construct set-based classifiers?

It is NOT the case that EACH element is NOT the target concept How do we impose the MI scenario?: Diverse Density (Maronet al.) • Calculation (Noisy-OR Model): • Inherent in Random Set formulation • Optimization • Combo of exhaustive search and gradient ascent

How can we use Random Sets for MIL? • Random set for MIL: Bags are sets • Idea of finding commonality of positive bags inherent in random set formulation • Sets have an empty intersection or non-empty intersection relationship • Find commonality using intersection operator • Random sets governing functional is based on intersection operator • Example: Bags with target {l,a,e,i,o,p,u,f} {f,b,a,e,i,z,o,u} {a,b,c,i,o,u,e,p,f} {a,f,t,e,i,u,o,d,v} Bags without target {s,r,n,m,p,l} {z,s,w,t,g,n,c} {f,p,k,r} {q,x,z,c,v} {p,l,f} intersection union Target concept = \ {a,e,i,o,u,f} {f,s,r,n,m,p,l,z,w,g,n,c,v,q,k} = {a,e,i,o,u}

Jeremy Bolton Paul Gader CSI Laboratory University of Florida