280 likes | 816 Views
Multiple Instance Learning. Outline. Motivation Multiple Instance Learning (MIL) Diverse Density Single Point Concept Disjunctive Point Concept SVM Algorithms for MIL Single Instance Learner (SIL) Sparse MIL mi-SVM MI-SVM Results Some Thoughts.
E N D
Outline • Motivation • Multiple Instance Learning (MIL) • Diverse Density • Single Point Concept • Disjunctive Point Concept • SVM Algorithms for MIL • Single Instance Learner (SIL) • Sparse MIL • mi-SVM • MI-SVM • Results • Some Thoughts
Motivation • It is not always possible to provide labeled data for training • Reasons: • Requires substantial human effort • Requires expensive tests • Disagreement among experts • Labeling is not possible at instance level • Objective: present a learning algorithm that can learn from ambiguously labeled training data
Multiple Instance Learning (MIL) • In MIL, instead of giving the learner labels for the individual examples, the trainer only labels collections of examples, which are called bags. • A bag is labeled positive if there is at least one positive example in it • It is labeled negative if all the examples in it are negative Negative Bags (Bi-) Positive Bags (Bi+)
Multiple Instance Learning (MIL) • The key challenge with MIL is coping with the ambiguity of not knowing which examples in the positive bag are actually positive and which are not • MIL model was first formalized by Dietterich et al. to deal with the drug activity prediction problem • Following that, an algorithm called Diverse Density was developed to provide a solution to MIL • Later, the method was extended to deal real-valued labels instead of binary labels.
Diverse Density • Diversity Density solves MIL problem by examining the distribution of the instances • It looks for a point that is close to instances in different positive bags and that is far from the instances in the negative bags • Such a point represents the concept that we would like to learn • Diversity Density is the measure of the intersection of the positive bags minus the union of the negative bags.
Diversity Density – Molecular Example • Suppose the shape of candidate molecule can be described by a feature vector • If a molecule is labeled positive, then at least one place along the manifold it took the right shape to fit into the target protein
Noisy-Or for Estimating the Density • It is assumed that the event can only happen if at least one of the causations occurred • It is also assumed that the probability of any cause failing to trigger the event is independent of any other cause
Diverse Density - Formally • By maximizing the Diverse Density we can find the point of intersection (the desired concept) where Alternatively, one can use most-likely-cause estimator
Single Point Concept • A concept that corresponds to single point in feature space • Every Bi+ has at least one instance that is equal to the true concept corrupted by some Gaussian noise. • Every Bi- has no instances that are equal to the true concept corrupted by some Gaussian noise Where k = number of dimensions in feature space sk = scaling vector
Disjunctive Point Concept • More complicated concepts are disjunction of d-single point concepts • A bag is positive if at least one of its instances is in the concept xt1, xt2 or xtd
Single Instance Learning MIL • SIL-MIL: Single Instance Learning approach • Applies bag’s label to all instances in the bag • A normal SVM is trained on the resulting dataset
Sparse MIL • All instances from negative bags are real negative instances • Small positive bags are more informative than large positive bags • A bag is represented as the sum of all its instances normalized by its 1 or 2-norm
Results • Datasets used: • AIMed: sparse dataset created from a corpus of protein-protein interactions. Contains 670 positive and 1,040 negative bags • CBIR: Content Based Image Retrieval domain. The task is to categorize images as to whether they contain an object of interest • MUSK: drug activity dataset. Bags corresponds to molecule, while bag instances correspond to three dimensional conformation of same molecule • TST: text categorization dataset in which MEDLINE articles are represented as bags of overlapping text passages.
mi-SVM • Instance level classification • Treats label instance labels yias unobserved hidden variable • Goal is to maximize the margin over the unknown instance labels • Suitable for instance classification
MI-SVM • Bag level classification • Goal is to maximize the bag margin, which is • The “most positive” instance in case of positive bags • The “least negative” instance in case of negative bags • Suitable for bag classification
Results: mi-SVM vs. MI-SVM Corel image data sets TREC9 document categorization sets
Some Thoughts • Can find multiple positive concepts in a single bag and learn these concepts? • Does varying sizes of negative bags have an influence on the learning algorithm? • Can we re-formulate MIL using Fuzzy Logic?
References • O. Maron and T. Lozano-Pérez, "A framework for multiple-instance learning," 1998, pp. 570-576. • R. C. Bunescu and R. J. Mooney, "Multiple instance learning for sparse positive bags," 2007, pp. 105-112. • J. Yang, "Review of Multi-Instance Learning and Its applications," 2008. • S. Andrews, et al., "Support vector machines for multiple-instance learning," Advances in neural information processing systems, pp. 577-584, 2003.