Inductive Logic Programming. Content. Introduction to ILP Basic ILP techniques An overview of the different ILP systems The application field of ILP Summary. Introduction to ILP. Inductive logic programming (ILP) = Inductive concept learning (I) Logic Programming (LP) Goal:
Introduction to ILP • Inductive logic programming (ILP) =Inductive concept learning (I) Logic Programming (LP) • Goal: • Develop a theoretical framework for induction • Build practical algorithms for inductive learning of relational concepts described in the form of logic programs • Background: • ILP theory based on the theory of LP • ILP algorithms based on experimental and application oriented ML research • Motivation: • Use of an expressive representational formalism as proportional logic • Use background knowledge in learning (in AI the use of domain knowledge is essential for achieving intelligent behaviour)
Introduction to ILP 2 • Inductive learning with background knowledge:Given a set of training examples E and background knowledge B find a hypothesis H, expressed in some concept description language L, such that H is complete and consistent with respect to the background knowledge B and the examples E • A hypothesis H is complete with regard to the background knowledge B and examples E if it covers all the positive examples i.e., if • A hypothesis H is consistent with respect to the background knowledge B and examples E if it covers none of the negative examples i.e.,
Introduction to ILP 4 • Example: The task is to define the target relation daughter(X,Y) • Background knowledge consists of ground facts about the predicates female(X) and parent(Y,X):parent(ann, mary) female(ann)parent(ann, tom) female(marry)parent(tom, eve) female(eve)parent(tom, ian) • Training examples:+: daughter(marry, ann) daughter(eve, tom)-: daughter(tom, ann) daughter(eve, ann) • Possible target relation: • Here the target relation is:
Introduction to ILP 5 • Dimension of ILP • Learning either a single concept or multiple concepts • Requires all the training examples to be given before the learning process (batch learners) or accepts examples one by one (incremental learners) • The learner may rely on an oracle to verify the validity of generalisation and/or classify examples generated by the learner (interactive; non interactive) • The learner may try to learn a concept from scratch or can accept an initial hypothesis (theory) which is then revised in the learning process. The latter system is called theory revision. • Existing ILP systems • Empirical ILP system: Batch non-interactive system that learns single predicates from scratch • Interactive ILP system: Interactive and incremental theory revision system that learns multiple predicates
-subsumption • STRUCTURING THE HYPOTHESIS SPACE: Introducing partial ordering into a set of clauses based on the -subsumption • Def: A substitution is a function from variables to terms. The application of a substitution to a W is obtained by replacing all occurences of each variable in W by the same term • Def:Let c and c' be two program clauses. Clause c -subsumes c' if there exits a substitution , such that • Def: Two clauses c and d are -subsumption equivalent if c -subsumes d and d -subsumes c. • Def: A clause is reduced if it is not -subsumption equivalent to any proper subset of itself.
-subsumption (2) • Example1: Let c be the clause: c = daughter(X,Y) parent(Y,X).A substitution applied to clause c is obtained by applying to each of its literal: c = daughter(mary, ann) parent(ann, mary). • Example2: Clause c -subsumes the clause c' = daughter(X,Y) female(X), parent(Y,X)under the empty substitution • Example3: Clause c -subsumes the clause c' = daughter(mary,ann) female(mary),parent(ann,mary),parent(ann,tom)under the substitution
-subsumption (3) • -subsumption introduces the syntactic notation of generality: • Clause c is at least as general as clause c' ( ), if c -subsumes c' • Clause c is more general than clause c' ( ), if holds and does not hold • c' is a refinement (specialisation) of c • c is a generalisationof c'
-subsumption (4) • -subsumption is important for learning: • It provides a generality order for hypotheses, thus structuring the hypothesis space • It can be used to prune large parts of the search space: • If generalising c to c' all the examples covered by cwill be also covered by c'This property is used to prune the search of more general clauses when e is a negative example: if c is inconsistent then all its generalisations will also be inconsistent. Hence, the generalisation of c do not need to be considered. • When specialising c to c' an example not covered by c will not be covered by any of its specialisations either.This property is used to prune the search of more specific clauses:if c does not cover a positive example none of its specialisation will do. Hence, the specialisations of c do not need to be considered.
-subsumption (5) • .Techniques based on the -subsumption: • Bottom-up: creating the least general generalisation from the training examples, relative to the background knowledge • Top-down searching of refinement graphs
Least General Generalisation • Properties of -subsumption: • If c -subsumes c' then c logically entails c', the reverse is not always true • The relation defines a lattice over the set of reduced clauses.This means that any two clauses have a least upper bound (lub) and a greatest lower bound (glb). • Def: The least general generalisation(lgg) of two reduced clauses c and c', denoted by lgg(c,c'), is the least upper bound of c and c' in the -subsumption lattice. • Example: • Let c and c' be two clauses:c= daughter(mary,ann) female(mary), parent(ann,mary).c'= daughter(eve,tom) female(eve), parent(tom,eve). • lgg of c and c': daughter(X,Y) female(X), parent(Y,X).
Least General Generalisation 2 • Computation of lgg with -subsumption: • lgg of terms • and V is a variable which represents • and at least one of s and t is a variable in this case, V is a variable which represents • Example: where V stands for lgg(a,b)
Least General Generalisation 3 • lgg of atoms • If atoms have the same predicate symbol p • lgg of literals • If and are atoms, then is computed as defined above • If both and are negative literals and then • If is a positive and is a negative literal, or vice versa,is undefined • Example:
Least General Generalisation 4 • lgg of clauseLet and . Then • Examples:If and thenwhere X stands for lgg(mary,eve) and Y stands for lgg(ann,tom)
Least General Generalisation 5 • Search lgg( , ) Input: and are atoms or literalsOutput: lgg( , ) = function lgg( , ): beginif head( ) head( ) then := ; return( )else := ; := whiledo find_position( , , , ); generate_variable ; substitute( , , , , )endwhile := ; return( )endifend
Least General Generalisation 6 Input: and are terms or literals Output: and terms procedure find_position( , , , ) begin ifthenreturn( , ) endif if ( is atomic or is atomic) then return( , ) endif ifthenreturn( , ) endif i:=1 while do find_position( , , , ) if thenreturn( , ) endif i:= i+1 endwhile end
Least General Generalisation 7 Input: and are terms or literals, and terms, X variable Output: and with substitution X for and procedure substitute( , , , , X) begin ifandthenreturn(X,X) endif ifthenreturn( , ) endif if ( is atomic or is atomic) thenreturn( , ) endif ; ; if thenreturn( , ) endif i:=1 whiledo substitute( , , , , X); i:= i+1 endwhile end
Generalisation techniques • Start from the most specific clause that covers a given positive example and then generalise the clause until it cannot be further generalised without covering negative examples. • Generalisation operator: Let L be a language bias, a generalisation operator maps a clause c to a set of clauses which are generalisations of c: • Generalisation operators perform two basic syntactic operations on a clause: • Apply an inverse substitution to the clause • Remove a literal from the body of the clause • Basic generalisation techniques: • Relative least generalisation (rlgg) • Inverse resolution
Relative Least Generalisation 1 • Relative least generalisation: The relative least generalisationrelativeto background knowledgeB K is the conjunction of ground facts and are positive example • Example: • Positive example: and and B as before where K is a conjunctionresult:
Relative Least Generalisation 2 Search for rlgg( , ) Input: and are two clauses in the form = { } and = { } Output: rlgg( , ) = function rlgg( , ): begin k := 1; l := 1; while k < n do while l < m do lgg( , , L); ; l := l+1 endwhile k := k+1 endwhile ; return( ) end
Inverse resolution • Basic idea: invert the resolution rule of deductive inference (Robinson) e.g., invert the SLD-Resolution proof procedure for definite programs • Example: Given the theory suppose we want to derive uproposition w resolves with to give v which is then resolved with derive u.
Inverse resolution 2 • Example: first order derivation tree for family example and andLet Suppose we want to derive the fact from
Inverse resolution 3 • Inverse resolution inverts the resolution process using the generalisation operator based on the inverting substitution • Given a W, an inverse substitution of a substitution is a function that maps terms in to variable such that • Example: • Take and the substitution : • Applying the inverse substitution the original clause is obtained • Example: Inverse substitution with places • Let and • Applying to W: • The specifies that the first occurrence of the subterm ann in the term is replaced by variable Xand the second occurrence is replaced by Y. • The use of places ensures that
Inverse resolution 4 • Example: Inverse Resolution • B consists of two clauses and • Let • The learner encounters the positive example: • The inverse resolution processes as follows: • It attempts to find which will together with entail and can be added to the current hypothesis Choosing the inverse resolution step generates becomes the current hypothesis H such that • It takes andBy computing using it generalise with respect to B, yieldingIn the H can be replaced by which together b entailsThe induced hypothesis is
Specialisation techniques • They search the hypothesis space in top-down manner, from general to specific hypotheses using -subsumption based on a specialisation operator (refinement operator) • Refinement operator: Given a language bias L, a refinement operator maps a clause c to a set of clause which are specialisations (refinements) of c • This operator typically computes only the set of minimum (most general) specialisations of a clause under -subsumptionIt employs two basic syntactic operations on a clause: • Apply a substitution to the clause • Add a literal to the body of the clause
Specialisation techniques (2) • Basic specialisation technique is top-down search of the refinement graph • Top-down learners start from the most general clauses and repeatedly refine them until they no longer cover negative examples
Specialisation techniques 2 • For a selected L and a given B the hypothesis space of program clauses is a lattice structured by the -subsumption generality ordering • In this lattice a refinement graph can be defined and used to direct the search from general to specific hypotheses • The refinement graph is a directed, acyclic graph in which nodes are program clauses and edges correspond to the basic refinement operators: substituting a variable with a term, adding a literal to the body of the clause. • First used Model Inference System (MIS, Shapiro 1983)
Specialisation techniques 4 • Generic Top-Down-Algorithm:Input:E the set of positive examples, B the background knowledgeL the description languageOutput: Hypothesis Hprocedure top_down_ILP( E, B, H) begin ; repeat choose repeat find the best refinement ; untilC is acceptable ; until hypothesis H satisfies stopping criterionreturn( H)end
An overview of the different ILP systems • Prehistory(1970) • Plotkin lgg, rlgg 1970-1971 • Early enthousiasm (1975-1983) • Vere 1975,1977, Summers 1977, Kodratoff and Jouanaud 1979 • Shapiro 1981, 1983 • Dark ages (1983-87)
An overview of the different ILP systems 2 • Renaissance (1987- ...) • MARVIN - Sammut & Banereji 1986, DUCE Muggleton 1987 • Helft 1987 • QuMAS - Mozetic 1987 Linus Lavrac et. al. 1991 • CIGOL - Muggleton &Buntine 1988 • ML-SMART Bergadano Giordana & Saitta 1988 • CLINT - De Raedt & Bruynooghe 1988 • BLIP, MOBAL Morik, Wrobel et. al. 1988 1989 • GOLEM - Muggleton &Feng 1990 • FOIL - quinland 1990 mFOIL - Dzerowski 1991 FOCL - Brunk & Pazzani FFOIL - Quinland 1996 • CLAUDIEN De Raedt & Bruynooghe 1993 • PROGOL - Muggleton et al 1995 • FORS- Karalic & Bratko 1996 • TILDE - Blockeel & De Raedt 1997 • .... • Studies Comparing the inductive learning system Foil with its versions FOCL, mFOIL, FFOIL, FOIL-I, FOSSIL, FOIDL and IFOIL
The application field of ILP • Application areas: • Knowledge acquisition for expert systems • Knowledge discovery in databases • Scientific knowledge discovery • Logic programs synthesis and verification • Inductive data engineering • Successful application • Finite element mesh design • Structure-activity prediction for drug design • Protein secondary-structure prediction • Predicting mutagenicity of chemical compounds
Summary • Inductive logic programming (ILP) =Inductive concept learning (I) Logic Programming (LP) • Empirical vs. Interactive ILP systems • Generalisation vs. specialisation techniques