1 / 24

ILP : Inductive Logic Programming

ILP : Inductive Logic Programming. Induction. Given a background theory Th (clauses) positive examples Pos (ground facts) negative examples Neg (ground facts) Find a hypothesis Hyp in the form of a logic program such that for every p  Pos : Th  Hyp |= p

junior
Download Presentation

ILP : Inductive Logic Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ILP : Inductive Logic Programming

  2. Induction • Given • a background theory Th (clauses) • positive examples Pos (ground facts) • negative examples Neg (ground facts) • Find a hypothesis Hyp in the form of a logic programsuch that • for every pPos: Th Hyp |= p • (Hyp covers p given Th ) • for every nNeg: Th Hyp |= n • (Hyp does not cover p given Th ) • ILP generates Hyp in the form of a logic program.

  3. Consistent hypothesis complete incomplete

  4. Inconsistent hypothesis

  5. Example • Predicates: • group(X), in_group(e1,c1). • circle(Z), square(Z), • triangle (t3,up). • Description of the first set • group(e1). • circle(c1). triangle(t1,up). triangle(t2,up). • triangle(t3,up). square(s1). • in_group(e1,c1). in_group(e1,t1). in_group(e1,t2). • inside(t3,c1). inside(s1,t2). • How can candidate hypothesis look like? • positive(X) :- group(X), in_group(X,Y1), triangle(Y1,up), in_group(X,Y2), triangle(Y2,up). • negative(X) :- group(X), in_group(X,Y1), triangle(Y1,down).

  6. What operations are used in the process of induction? Generalization and specialization example action hypothesis +p(b,[b])add clausep(X,Y). -p(x,[])specialisep(X,[V|W]). -p(x,[a,b]) specialise p(X,[X|W]). +p(b,[a,b]) add clause p(X,[X|W]).p(X,[V|W]):-p(X,W). Induction: example

  7. Algorithms ILP Generic ILP algorithm needs description of operations for design of new hypothesis • Top-down approach: specialization (used e.g. in FOIL) • Bottom-up approach: generalization (used e.g. in GOLEM)

  8. m(X,X) m(X,Y):-m(Y,X) m([X|Y],Z) m(X,[Y|Z]) m(X,[X|Z]) m(X,[Y|Z]):-m(X,Z) m(X,Y) • The set of (equivalence classes of) clauses is a lattice: • C1 is more general than C2 iff for some substitution : C1 C2 • greatest lower bound -MGS, least upper bound -LGG • Specialisation applying a substitution and/or adding a literal • Generalisation applying an inverse substitution and/or removing a literal • Comment: There can be infinite chains! odstavce Generality of clauses

  9. Specialization operators Hypothesis F is a specialization of G iff F is a logical consequence of G G |= F (any model of G is a model of F). Specialization operatorspec specifies the set of its specializations of a given clause. 2 basic spec. operations • processing of used variables • unification of 2 variables: spec(p(X, Y )) = p(X, X) • substitution • by a constant : spec(num(X)) = num(0) • by a compount term:spec(num(X) = num(s(Y)) . • Adding a literal into the body spec (p(X,Y)) = (p(X,Y):- edge(U,V))

  10. element(X,Y) element(X,X) element(X,Y):-element(Y,X) element(X,[Y|Z]) element([X|Y],Z) element(X,[X|Z]) element(X,[Y|Z]):-element(X,Z) Part of the specialisation graph for element/2

  11. ILP generalization methods(searching the hypothesis space bottom-up) The set of clauses is partially ordered by the relation of subsumption, characterizing „generalization“and specialization (refinement) Def.: Let c, c1 be clauses. It is said that c-subsumesc1, if there is a substition such that c c1. Example: c= daughter(X,Y) :- parent(Y,X). c1 = daughter(X,Y) :- female(X),parent(Y,X). c2= daughter(mary,ann) :- female(mary),parent(ann,mary),parent(ann,tom). Clause c is at least as general as the clause c1 iff c-subsumes c1. Clause c is more general than the clause c1 (c1 is a specialization of c) iff c- subsumes c1 and it is not true that c1- subsumes c.

  12. Usage of the operation -subsumes Lemma 1: If c-subsumes c1, thenc1 is a consequence of c, ie. c |- c1. Does the reverse claim hold? NO! See example c = list([V|W]) :- list(W). c1= list([X,Y|Z]) :- list(Z). Lemma 2: Using the partial order defined by -subsumption there can be found for any 2 clauses c, d their least upper and biggest lower bound (which is unique up to renaming of variables and -equivalence). Ussage? Pruning the space of hypotéz. Notation: d<cifd-subsumesc, ie. d is a generalization of c Application: Let e be an positive example covered by the clause c, ie. c |- e. According toL1 our hypothesis should be the generalizations of examples.

  13. -subsumtion and the search in the space of hypothesis If we generalize c to d ( d < c), all examples covered by c will be covered by d as well. Ifccovers somenegative example, it is no good to generalize c. If we specialize c tof ( c < f), then the example not covered by c, will not be covered by f. If cdoes not cover some pozitive example, c is not worth of further specialization. Search for least general generalization – operator lgg – is purely syntactic task Example:lgg( [a,b,c], [a,c,d]) = [a,X,Y]. lgg( f(a,a), f(b,b)) = f (lgg(a,b), lgg(a,b)) = f (V,V), Attention to occurence of the same variable V in the case of repeated occurence of lgg(a,b), this is not the case of lgg(a,b) andlgg(b,a)

  14. Definition of the lgg operator lgg for terms t1, t2 • lgg(t,t) = t • lgg(f(s1,..,sn),f (t1,..,tn)) = f(lgg(s1,t1),.., lgg(sn,tn)) • lgg(f(s1,..,sn),g (t1,..,tm)) = V, V- variable and f,g are different function symbols • lgg(s,t) = V, where Vis a variable provided that at least one of the terms s,t is a variable lgg for atomic formulas lgg(A1,A2) • lgg(p(s1,..,sn),p(t1,..,tn)) = p(lgg(s1,t1),.., lgg(sn,tn)) – the case of 2 atoms with the same predicatep • lgg(p(s1,..,sn),q (t1,..,tn)) is not defined, if p and q are different symbols lgg for literals lgg(L1,L2) • If both L1and L2 are positive, the task is reduced to lgg of atomic formulas • If both L1 and L2 are negative, ie. L1= not A1, L2= not A2, than lgg (L1,L2) = not lgg(A1,A2) • If L1 is positive and L2 negative, lgg(L1,L2) is not defined Example: lgg(parent(ann,mary),parent(ann,tom)) = parent(ann,X). lgg(parent(ann,mary),daughter(ann,tom)) not defined

  15. lgg for clauses c1,c2 Suppose c1 = {L1,..,Ln} andc2 = {K1,..,Km}, then lgg (c1,c2) = { Fij = lgg(Li, Kj): Li Î c1, Kj Î c2 and lgg(Li ,Kj ) is defined} Example: c1 = daughter(mary,ann) :- female(mary),parent(ann,mary). c2 = daughter(eve,tom) :- female(eve),parent(tom,eve). lgg(c1,c2) = daughter(X,Z) :- female(X),parent(Z,X). Generalization wrt to background knowledge represented by conjunction K of ground facts - relative generalization by the operator rlgg rlgg(A1,A2) = lgg (A1:-K, A2 :-K) Appliaction in ILP: K is the set of all available facts from the task domain, atoms A1,A2correspond to the training examples

  16. Example: application of rlgg e1= daughter(mary,ann), e2= daughter(eve,tom) K = parent(ann, mary) & …& parent(tom,ian) & female(ann) & … female(eve). c1= e1 :-K = d(m,a):-p(a,m) ,p(a,t),p(t,e),p(t,i),f(a),f(m),f(e). c2 = e2:-K = d(e,t):-p(a,m),p(a,t),p(t,e) ,p(t,i) ,f(a),f(m),f(e). rlgg(e1,,e2)= lgg(c1,,c2) = d(Vm,e,Va,t ):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e), p(a, Vm,t),p(Va,t, Vm,e), p(Va,t, Vm,i), p(Va,t, Vm,i) , p(a, Vt,m), p(V,t,a, Vi,m),…, f (Va,m),f (Va,e),f (Vm,e), … ., whereVm,e is lgg(m,e). Caution! The results of rlgg tends to be very long!

  17. Irelevant literals d(Vm,e,Va,t ):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e),p(a, Vm,t), p(Va,t, Vm,e), p(Va,t, Vm,i), p(Va,t, Vm,i) , p(a, Vt,m), p(V,t,a, Vi,m),…,f (Va,m),f (Va,e),f (Vm,e), … . Are there some literals which make no difference for distinguishing between positive and negative examples? If so, can they be omitted? If omitting a literal does not result in covering a negative example, we consider this literal to be irelevant. d(Vm,e,Va,t ):-p(Va,t, Vm,e), f (Vm,e). daughter(X,Y) :- parent(Y,X), female(X).

  18. Generic ILP algorithm using a set R of rules for modif. of hypothesis Input: B background knowledge E+ (E-) the set of positive (negative) examples QH := inicialize(B; E+, E –) ; /*suggestion of the starting hypothesis*/ while not (end_criterion(QH)) do choose ahypothesisHfromQH ; choose_modification_rulesr1,…,rkfromR ; applyingr1,…,rktoHcreate the new hypothesisH1fitting best E+andE-; QH := (QH-H) + H1 ; cancel_some_membersofQH ; /*pruning*/ filter the set of examples E+andE- Choose_hypothesisPfromQH

  19. When is ILP usefull? • ILP is a good choice whenever • relation among considered objects have to be taken into account • the training data have no uniform structure (some objects are described extensively, other are mentioned in several facts only) • there is extensive background knowledge which should be used for construction of hypothesis • Some domains with succesfull industrial or research ILP applications: • Bioinformatics, medicine, ecology • Technical applications (finite element mash design, ..) • Natural language processing

  20. Bioinformatics: SAR tasks • Structure Activity Relationships (SAR) task: given • chem.structure of a compund • empiric data about its toxicity/ mutageneticity/ terapeutic influence. • What is the cause of the observed behaviour? Pozitive Negative Result: struktural indicator

  21. Bioinformatics - structural description of organic compounds • Primary structure = sequence of aminoacids. • Is it possible to predict the secondary structure (folds in space) from info about its primary structure ? • Support for interpretation of NMR (nucleo-magnetic resonance) spectrum - there is required classification into 23 structural types. Classical ML methods - 80% accuracy, ILP 90% - corresponds to the results of a domain expert

  22. Bioinformatics - carcinogenicity • 230 aromatic and heteroaromatic compounds of natrium 188 compunds are well classifiable by attribute methods + remaining 42 coumponds, which are highly regression-unfriendly (denoted as RU group). • The advantages of relational reprezentation have been demonstarted on the RU group : The hypothesis suggested by the ILP system PROGOL achieved 88%accuracy while the classical attribute ML methods reached about 20 % less.

  23. East-West Trains

  24. Systems Aleph (descendant of P-Progol), Oxford University Tilde + WARMR = ACE (Blockeel, De Raedt 1998) FOIL (Quinlan 1993) GOLEMdesignes a hypothesis by a method which combines several rlgg steps and omitting of irelevant literals MIS (Shapiro 1981), Markus (Grobelnik 1992), WiM (1994) RSD (Železný 2002) search for interesting subgroups Other systems: http://www-ai.ijs.si/~ilpnet2/systems/

More Related