A Framework for Learning Rules from Multi-Instance Data

A Framework forLearning Rules fromMulti-Instance Data Yann Chevaleyre and Jean-Daniel Zucker University of Paris VI – LIP6 - CNRS

Motivations • The choice of a good representation is a central issue in ML tasks. atomicdescription +high expressivity -Untractability, unless strong biases globaldescription -Low expressivity +Tractable Att/Val representation MI Representation Relational representation • Most available MI learners use numerical data, and generate noneasily interpretable hypotheses • Our goal: design efficient MI learners handling numeric andsymbolic data, and generating interpretable hypotheses, such as decision trees or rule sets

Outline • 1) Multiple-Instance Learninig • Multiple-instance representation, where are the MI-data,the MI learning problem • 2)Extending a propositional algorithm to handle MI data • Method, extending the Ripper rule learner • 3) Analysis of the multiple-instance extension of Ripper • Misleading litterals, unrelevant litterals, litteral selection problem • 4) Experimentations & Applications • Conclusion et future work

instances bag The Multiple Instance Representation: definition Standard A/V representation: Multiple Instance representation: is represented by + examplei {0,1}-valued label li A/V vector xi A/V vector xi,1 is represented by A/V vector xi,2 example i + {0,1}-valued label li A/V vector xi,r

1 0,n Where can we find MI data? • Many complex objects, such as images or molecules, caneasily be represented with bags of instances • Relational databases may also be represented this way • More complex representations, such as datalog facts, may be MI-propositionalized [zucker98], [Alphonse and Rouveirol 99]

Representing time series as MI data • By encoding each sub-sequence (s(tk), ... ,s(tk+n)) as an instance, the representation becomes invariant by translation s(t) t tk tj • Windows can be chosen of various size to make the representation invariant by rescaling

The multiple-instance learning problem From B+,B- sets of positive(resp. negative) bags, find a consistent hypothesis H unbiased multiple-instance Learning problem Their exists a function f, such that :lab(b)=1 iff x  b, f (x) single-tuple bias Find a function h covering at least one instance per positive bag and no instance from any negative bag multi-instance learning [Dietterich 97] Note: the domain of h is theinstance space, instead of thebag space

Extending a propositional learner • We need to represent the bags of instances as asingle set of vectors b1+ Adding bag-id and label to each instance b2- • Measure the degree of multiple-instance-consistancy of the hypothesis being refined. • Instead of measuring p(r), n(r), the number of vectors covered by r, compute p*(r), n*(r), the number of bags for which r covers at least one instance Single-tuple coverage measure

Extension de l ’algorithme Ripper (Cohen 95) Ripper (Cohen 95) is a fast and efficient top-down rule learner,which compares to C4.5 in terms of accuracy, being much faster • Naive-RipperMi is the MI-extensions of Ripper • Naive-Ripper-MI was tested on the musk (Dietterich 97) tasks. On musk1 (avg of 5,2 instances per bag), it achieved good accuracy.On musk2 (avg 65 instances per bag), only 77% of accuracy.

Y 8 6 4 2 X 2 4 6 8 10 12 Empirical Analysis of Naive-RipperMI • Goal: Analyse pathologies linked to the MI problem and to the Naive-Ripper-MI algorithm. • Misleading litterals • Unrelevant litterals • Litteral selection problem • Analysing the behaviour of NaiveRipperMi on a simple dataset • white triangles bag • white squares bag... • 5 positivebags: • black triangles bag • black squares bag... • 5 negativebags:

Y 6 4 2 2 4 X 6 8 10 12 Analysing Naive-RipperMI • Learning task: induce a rules covering at least one instanceof each positive bag. • Target concept : X > 5 & X < 9 & Y > 3

Y X > 5 & X < 9 & Y > 3 • Target concept : 6 4 2 2 4 X 6 8 10 12 Analysing Naive-RipperMI : misleading litterals • 1st step: Naive-RipperMi induces a rule X > 11 & Y < 5 Misleadinglitterals

Y 6 4 2 2 4 X 6 8 10 12 Analysing Naive-RipperMI : misleading litterals • 2nd step: Naive-RipperMi removes the covered bag(s), andinduces another rule...

Analysing Naive-RipperMI : misleading litterals • Misleading litterals: litterals bringing information gain but contradicting the target concept • Multiple-instance specific phenomenon. • Dispite other single-instance pathologies, (overfitting, attribute selection problem), increasing the number of examples won’t help • The « Cover-and-differentiate » algorithm reduced the chance of finding the target concept If lis a misleading litteral, then l is not. It is thus sufficient, when the litteral l has been induced, toexamin l at the same time. => partitioning the instance space

Y 6 4 2 2 4 X 6 8 10 12 Analysing Naive-RipperMI : misleading litterals • Build a partition of the instance space • Extract the best possible rule : X < 11 & Y < 6 & X > 5 & Y > 3

Y 6 4 2 X 2 4 6 8 10 12 Analysing Naive-RipperMI : irrelevant litterals • In multiple-instance learnig, irrelevant litterals can occur anywhere in the rule, instead of mainly at the end of a rule in the single-instance case • Use global pruning Y < 6 & Y > 3 & X > 5 & X < 9

Y 6 4 2 2 4 X 6 8 10 12 Analysing Naive-RipperMI : litteral selection problem • When the number of instances per bag increases, any litteral covers any bag. Thus, we lack information to select a good litterals

Analysing Naive-RipperMI : litteral selection problem • We must take into account the number of covered instances • Making an assumption on the distribution of instances canlead to a formal coverage measure The single distribution model: A bag is made of r instances drawn i.i.d. from a unique distributionD + widely studied in MI learning [Blum98,Auer97,...] + simple coverage measure, and good learnability properties - very unrealistic The two distribution model: A positive (resp. negative) bag is made of r instances drawn i.i.d. from D+(resp.D- ) with at least one (resp. none) covered by f. +more realistic - complex formal measure useful for small number of instances (log # bags) • Design algorithms or measures which « work well » with these models

Analysing Naive-RipperMI : litteral selection problem Y 6 Y > 5 4 Target concept 2 2 4 X 6 8 10 12 • Compute for each positif bag Pr(at least one of the k covered instance  target concept)

Analysis of RipperMi: experiments • Artificial datasets of 100 bags with a variable number of instances per bag. • Target concept: monomials (hard to learn with 2 instances per bag [Haussler89]) Error rate (%) # instances per bag • On the mutagenesis problem : NaiveRipperMi: 78% RipperMi-refined-cov: 82%

Application : Anchoring symbols[with Bredeche] W Perception I seea door What isall this ? segmentation lab = door IF Color = blue AND size > 53 THEN DOOR • Early experiments with NaiveRipperMi reached 80% accuracy

Conclusion & Future work • Many problems which existed in relational learning appear clearly within the multiple-instance framework. • Algorithms presented here are aimed at solving these problems They were tested on artificial datasets. • Other realistic models, leading to better heuristics • Instance selection and attribute selection • Future work: MI-propositionalization, applying multiple-instance learning to data-mining tasks • Many ongoing applications ...

A Framework for Learning Rules from Multi-Instance Data