860 likes | 1.01k Views
Multiple Instance Hidden Markov Model: Application to Landmine Detection in GPR Data. Jeremy Bolton, Seniha Yuksel , Paul Gader CSI Laboratory University of Florida. Highlights. Hidden Markov Models (HMMs) are useful tools for landmine detection in GPR imagery
E N D
Multiple Instance Hidden Markov Model: Application to Landmine Detection in GPR Data Jeremy Bolton, SenihaYuksel, Paul Gader CSI Laboratory University of Florida
Highlights • Hidden Markov Models (HMMs) are useful tools for landmine detection in GPR imagery • Explicitly incorporating the Multiple Instance Learning (MIL) paradigm in HMM learning is intuitive and effective • Classification performance is improved when using the MI-HMM over a standard HMM • Results further support the idea that explicitly accounting for the MI scenario may lead to improved learning under class label uncertainty
Outline • HMMs for Landmine detection in GPR • Data • Feature Extraction • Training • MIL Scenario • MI-HMM • Classification Results
GPR Data • GPR data • 3d image cube • Dt, xt, depth • Subsurface objects are observed as hyperbolas
GPR Data Feature Extraction • Many features extracted from in GPR data measure the occurrence of an “edge” • For the typical HMM algorithm (Gaderet al.), • Preprocessing techniques are used to emphasize edges • Image morphology and structuring elements can be used to extract edges Image Preprocessed Edge Extraction
4-d Edge Features Edge Extraction
Concept behind the HMM for GPR • Using the extracted features (an observation sequence when scanning from left to right in an image) we will attempt to estimate some hidden states
HMM Features • Current AIM viewer by Smock Image Feature Image Rising Edge Feature Falling Edge Feature
Sampling HMM Summary • Feature Calculation • Dimensions (Not always relevant whether positive or negative diagonal is observed …. Just simply a diagonal is observed) • HMMSamp: 2d • Down sampling depth • HMMSamp: 4 • HMM Models • Number of States • HMMSamp : 4 • Gaussian components per state (Fewer total components for probability calculation) • HMMSamp : 1 (recent observation)
Training the HMM • Xuping Zhang proposed a Gibbs Sampling algorithm for HMM learning • But, given an image(s) how do we choose the training sequences? • Which sequence(s) do we choose from each image? • There is an inherent problem in many image analysis settings due to class label uncertainty per sequence • That is, each image has a class label associated with it, but each image has multiple instances of samples or sequences. Which sample(s) is truly indicative of the target? • Using standard training techniques this translates to identifying the optimal training set within a set of sequences • If an image has N sequences this translates to a search of 2N possibilities
Training Sample Selection Heuristic • Currently, an MRF approach (Collins et al.) is used to bound the search to a localized area within the image rather than search all sequences within the image. • Reduces search space, but multiple instance problem still exists
Standard Learning vs. Multiple Instance Learning • Standard supervised learning • Optimize some model (or learn a target concept) given training samples and corresponding labels • MIL • Learn a target concept given multiplesets of samples and corresponding labels for the sets. • Interpretation: Learning with uncertain labels / noisy teacher
Multiple Instance Learning (MIL) • Given: • Set of I bags • Labeled + or - • The ith bag is a set of Ji samples in some feature space • Interpretation of labels • Goal: learn concept • What characteristic is common to the positive bags that is not observed in the negative bags
Standard learning doesn’t always fit: GPR Example • Standard Learning • Each training sample (feature vector) must have a label • But which ones and how many compose the optimal training set? • Arduous task: many feature vectors per image and multiple images • Difficult to label given GPR echoes, ground truthing errors, etc … • Label of each vector may not be known EHD: Feature Vector
Learning from Bags • In MIL, a label is attached to a set of samples. • A bag is a set of samples • A sample within a bag is called an instance. • A bag is labeled as positive if and only if at least one of its instances is positive. POSITIVE BAGS (Each bag is an image) NEGATIVE BAGS (Each bag is an image)
EHD: Feature Vector MI Learning: GPR Example • Multiple Instance Learning • Each training bag must have a label • No need to label all feature vectors, just identify images (bags) where targets are present • Implicitly accounts for class label uncertainty …
MI-HMM • In MI-HMM, instances are sequences Direction of movement NEGATIVE BAGS POSITIVE BAGS
MI-HMM • Assuming independence between the bags and assuming the Noisy-OR (Pearl) relationship between the sequences within each bag • where
MI-HMM learning • Due to the cumbersome nature of the noisy-OR, the parameters of the HMM are learned using Metropolis – Hastings sampling.
Sampling • HMM parameters are sampled from Dirichlet • A new state is accepted or rejected based on the ratio rat iteration t + 1 • where P is the noisy-or model.
Discrete Observations • Note that since we have chosen a Metropolis Hastings sampling scheme using Dirichlets, our observations must be discretized.
MI-HMM Summary • Feature Calculation • Dimensions • HMMSamp: 2d • MI-HMM: 2d features are descretized into 16 symbols • Down sampling depth • HMMSamp: 4 • MI-HMM: 4 • HMM Models • Number of States • HMMSamp : 4 • MI-HMM: 4 • Components per state (Fewer total components for probability calculation) • HMMSamp : 1 Gaussian • MI-HMM: Discrete mixture over 16 symbols
MI-HMM vs Sampling HMM • Small Millbrook HMM Samp (12,000) MI-HMM (100)
Concluding Remarks • Explicitly incorporating the Multiple Instance Learning (MIL) paradigm in HMM learning is intuitive and effective • Classification performance is improved when using the MI-HMM over a standard HMM • More effective and efficient • Future Work • Construct bags without using MRF heuristic • Apply to EMI data: spatial uncertainty
Standard Learning vs. Multiple Instance Learning • Standard supervised learning • Optimize some model (or learn a target concept) given training samples and corresponding labels • MIL • Learn a target concept given multiplesets of samples and corresponding labels for the sets. • Interpretation: Learning with uncertain labels / noisy teacher
Multiple Instance Learning (MIL) • Given: • Set of I bags • Labeled + or - • The ith bag is a set of Ji samples in some feature space • Interpretation of labels • Goal: learn concept • What characteristic is common to the positive bags that is not observed in the negative bags
EHD: Feature Vector MIL Application: Example GPR • Collaboration: Frigui, Collins, Torrione • Construction of bags • Collect 15 EHD feature vectors from the 15 depth bins • Mine images = + bags • FA images = - bags
Standard vs. MI Learning: GPR Example • Standard Learning • Each training sample (feature vector) must have a label • Arduous task • many feature vectors per image and multiple images • difficult to label given GPR echoes, ground truthing errors, etc … • label of each vector may not be known EHD: Feature Vector
EHD: Feature Vector Standard vs MI Learning: GPR Example • Multiple Instance Learning • Each training bag must have a label • No need to label all feature vectors, just identify images (bags) where targets are present • Implicitly accounts for class label uncertainty …
Random Set Brief • Random Set
It is NOT the case that EACH element is NOT the target concept How can we use Random Sets for MIL? • Random set for MIL: Bags are sets • Idea of finding commonality of positive bags inherent in random set formulation • Sets have an empty intersection or non-empty intersection relationship • Find commonality using intersection operator • Random sets governing functional is based on intersection operator • Capacity functional : T A.K.A. : Noisy-OR gate (Pearl 1988)
Random Set Functionals • Capacity functionals for intersection calculation • Use germ and grain model to model random set • Multiple (J) Concepts • Calculate probability of intersection given X and germ and grain pairs: • Grains are governed by random radii with assumed cumulative: Random Set model parameters Germ Grain
x T x T T x x x x T T x x x RSF-MIL: Germ and Grain Model • Positive Bags = blue • Negative Bags = orange • Distinct shapes = distinct bags
Multiple Concepts: Disjunction or Conjunction? • Disjunction • When you have multiple types of concepts • When each instance can indicate the presence of a target • Conjunction • When you have a target type that is composed of multiple (necessary concepts) • When each instance can indicate a concept, but not necessary the composite target type
Conjunctive RSF-MIL • Previously Developed Disjunctive RSF-MIL (RSF-MIL-d) • Conjunctive RSF-MIL (RSF-MIL-c) Noisy-OR combination across concepts and samples Standard noisy-OR for one concept j Noisy-AND combination across concepts
Synthetic Data Experiments • Extreme Conjunct data set requires that a target bag exhibits two distinct concepts rather than one or none AUC (AUC when initialized near solution)
Disjunctive Target Concepts • Using Large overlapping bins (GROSS Extraction) the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists Target Concept Type 1 NoisyOR Target Concept Type 2 NoisyOR OR … Target Concept Type n NoisyOR Target Concept Present?
What if we want features with finer granularity • Fine Extraction • More detail about image and more shape information, but may loose disjunctive nature between (multiple) instances Constituent Concept 1 (top of hyperbola) NoisyOR Target Concept Present? AND … Constituent Concept 2 (wings of hyperbola) NoisyOR Our features have more granularity, therefore our concepts may be constituents of a target, rather than encapsulating the target concept