Introduction to Inductive Logic Programming (ILP)

Inductive Logic Programming Luis Tari

Logic Programming parent_of(charles,george). parent_of(george,diana). parent_of(bob,harry). parent_of(harry,elizabeth). grandparent_of(X,Y) :- parent_of(X,Z), parent_of(Z,Y). • Consider the following example of a logic program: • From the program, we can ask queries about grandparents. • Query: grandparent_of(X,Y)? • Answers: • grandparent_of(charles,diana). • grandparent_of(bob,elizabeth).

What is ILP? • Inductive Logic Programming (ILP) • Automated learning of logic rules from examples and background knowledge • Example: learn the rule for grandparents, given background knowledge of parents and examples of grandparents • ILP can be used for classification and prediction • Hypotheses are generated in ILP, unlike “black box” approach for classifiers such as SVM

B U He for every e E+. • B U Hf for every f  E-. • B U H is consistent. Assume that Be for some e  E+. ILP – formal definitions • Given • a logic program B representing background knowledge • a set of positive examples E+ • a set of negative examples E- • Find hypothesis H such that:

Example • Background knowledge B: • parent_of(charles,george). • parent_of(george,diana). • parent_of(bob,harry). • parent_of(harry,elizabeth). • Positive examples E+: • grandparent_of(charles,diana). • grandparent_of(bob,elizabeth). • Generate hypothesis H: • grandparent_of(X,Y) :- parent_of(X,Z), parent_of(Z,Y).

ILP systems • Two of the most popular ILP systems: • Progol • FOIL • Progol [Muggleton95] • Developed by S. Muggleton et. al. • Learns first-order Horn clauses (no negation in head and body literals of hypotheses) • FOIL [Quinlan93] • Developed by J. Quinlan et. al. • Learns first-order rules (no negation in head literals of the hypotheses)

Rule Learning (Intuition) • How to come up the rule for grandparent_of(X,Y)? • Take the example grandparent_of(bob,elizabeth). • Find the subset of background knowledge relevant to this example: parent_of(bob,harry), parent_of(harry,elizabeth). • Form a rule from these facts grandparent_of(bob,elizabeth) :- parent_of(bob,harry), parent_of(harry,elizabeth). • Generalize the rule grandparent_of(X,Y) :- parent_of(X,Z), parent_of(Z,Y). • Check if this rule is valid wrt the positive and negative examples

Progol Algorithm Outline • From a positive example, construct the most specific rule rs. • Based on rs, find a generalized form rg of rs so that rg has the score(rg) has the highest value among all candidates. • Remove all positive examples that are covered by rg. • Go to step 1 if there are still positive examples that are not yet covered.

Scoring hypotheses • score(r) is a measure of how well a rule r explains all the examples with preference given to shorter rules. • pr = number of +ive examples correctly deducible from r • nr = number of -ive examples correctly deducible from r • cr= number of body literals in rule r • score(r) = pr – (nr + cr)

Applications of ILP • Constructing Biological Knowledge Bases by Extracting Information from Text Sources (M. Craven & J. Kumlien) [Craven99] • The automatic discovery of structural principles describing protein fold space (A. Cootes, S.H. Muggleton, and M.J.E. Sternberg) [Cootes03] • More from UT-ML group (Ray Mooney) • http://www.cs.utexas.edu/~ml/publication/ilp.html

Extraction of relations in biomedical text [Craven99] • Applied on biomedical text • Used FOIL system to learn rules describing relations of interest

Relations of interest

Example of relation from text • We want to extract the following relation: • Sample sentence from biomedical articles:

From parse trees to background knowledge

Example of a learned rule • The first two literals indicate that the phrase referencing the subcellular localization follows the phrase referencing the protein, and there is one phrase separating them • Other literals indicate that the sentence must satisfy a particular Naïve Bayes classifier.

Protein fold space [Cootes03] • Folds of a protein are described “in terms of the spatial and topological arrangements of their regular secondary structure elements.” • Why using ILP (Progol system was used) • “Classification alone will not explain why some types of fold are more prevalent than others or why some potential protein folds are not observed at all.” • Goal: Automatically generate descriptions for fold classes

Protein fold space • One of the rules generated for Rossmann fold: • Fold A belongs to this fold class if: • A has a total number of helices between 3 and 4; • A has helices B of type h and at core positions b respectively; • B contains a glycine in the nterm region; • B contains a glycine in the middle region.

References • [Quinlan93] J. R. Quinlan, R. M. Cameron-Jones. FOIL: A Midterm Report. Proceedingsof Machine Learning: ECML-93 • [Muggleton95] S. Muggleton. Inverse Entailment and Progol. New Generation Computing Journal, 13:245-286, 1995. • [Craven99] M. Craven & J. Kumlien (1999). Constructing Biological Knowledge Bases by Extracting Information from Text Sources. ISMB 99. • [Cootes03] A. Cootes, S.H. Muggleton, and M.J.E. Sternberg. The automatic discovery of structural principles describing protein fold space. Journal of Molecular Biology, 330(4):839-850, 2003.

Introduction to Inductive Logic Programming (ILP)