Planten schema paludarium

Inductive Logic Programming and its use in Datamining Filip ZeleznyCenter of Applied Cybernetics Faculty of ElectrotechnicsCzech Technical University in Prague

Structure of Talk • Intro: ML & Datamining • ILP: Motivation, Concept • Basic Technique • Some Applications • Novel Approaches • Conclusions

Introduction • Machine Learning (ML) • a subfield of artificial intelligence, studies artificial systems that improve their behavior on the basis of experience, described formally by data. This is often achieved by reasoning analogically, or by building a model of the given domain on the basis of the data. • E.g. Pattern recognition by a trained neural network • Data Mining (DM) • is concerned with discovering understandably formulated knowledge that is valid but previously unseen in given data. This is often achieved by employing ML methods producing human-understandable models with predictive (e.g. predict an object attribute knowing the other attributes) or descriptive (e.g. find a frequently repeating pattern in data) capabilities. • E.g. ‘Shopping bag rule’: sausage  mustard

ILP: Points of View • Software Engineering View • ILP synthesizes logic programs from examples • ... but the programs may be used for data classification • Machine Learning View • ILP develops theories about data using predicate logic • ... but the theories are as expressive as algorithms (Turing machine)

A Motivation

Data Mining Example 1 • Table of cars: • Predict the attribute ‘ affordable ’ ! • Rule discovered: • Attribute learning is appropriate. • size=small & luxury=low  affordable

Data Mining Example 2 (1)[L. De Raedt, 2000] • Positive Examples • Negative Examples

Data Mining Example 2 (2)[L. De Raedt, 2000] • How to represent in AVL? • Assume fixed number of objects • Problem 1: exchange objects 1 & 2 • exponential number of different representations for the same entity

Data Mining Example 2 (3)[L. De Raedt, 2000] • Problem 2: Positional relations • explosion of false atributes • Problem 3: Variable number of objects • explosion of empty fields • explosion of entire table  We need a structural representation!

Data Mining Example 2 (3) • Could be done with more relations (tables) • BUT! Standard ML / Datamining Algorithms can work with 1 relation only • Neural nets, AQ (rules), C4.5 (decision trees), …  We need multirelational learning algorithms!

The language of Prolog

The Language of Prolog- Informal Introduction (1) • Ground facts (Predicate w. constants) add(1,1,2). • Variables add(X,0,X). • Functions e.g. s(X) - successor of X • Rules (implications) add(s(X),Y,s(Z))  add(X,Y,Z).add(0,X,X).

The Language of Prolog- Informal Introduction (2) • Invertibility minus(A,B,C)  add(B,C,A). • Functions can be avoided (flattening) suc(X,Y)  X is Y-1. (built-in arithmetics) add(0,X,X). add(X,Y,Z)  suc(A,X) & suc(B,Z) & add(A,Y,B).

The ILP Concept

Deduction (in Logic Programming) Apriori (background) knowledge about integers Theory (hypothesis) about addition suc(X,Y)  X is Y-1. add(0,X,X). add(X,Y,Z)  suc(A,X) & suc(B,Z) & add(A,Y,B). add(1,1,2), add(3,5,8), add(4,1,5), ... add(1,3,5), add(8,7,6), add(1,1,1), ... Positive examples of addition Negative examples of addition

Induction(in Inductive Logic Programming) Apriori (background) knowledge about integers Positive and negative examples of addition suc(X,Y)  X is Y-1. add(1,1,2), add(3,5,8), add(4,1,5), ... add(1,3,5), add(8,7,6), add(1,1,1), ... add(0,X,X). add(X,Y,Z)  suc(A,X) & suc(B,Z) & add(A,Y,B). Theory (hypothesis) about addition

Basic ILP Technique (1) • Search through a clause implication lattice • From general to specific (top-down) • From specific to general (bottom-up) add(X,Y,Z) add(X,Y,Z)  suc(A,X) add(X,Y,Z)  suc(B,Z) add(X,Y,Z)  suc(A,X), suc(B,X) ... etc. add(X,Y,Z)  suc(A,X) & suc(B,Z) & add(A,Y,B)

Basic ILP Technique (2) • Clauses usually constructed one-by-one • e.g. specialize until covers no negatives,then begin a new clause for the rest of positives • Implication is undecidable • instead use syntactic. subsumtion (NP - hard) • measure generality of clause with background knowledge • Efficiency: use strong bias! • syntactical: • indicate input/output vars; maximum clause length • semantical: e.g. preference heuristics

Applications

Protein Structure Prediction(1) [Muggleton, 1992] • Predict the secondary structure of protein • examples: • alpha(Protein, Position). - residue at Position in Protein is in alpha helix. • negatives: all other residues • background knowledge: • position(Protein, Pos, Residue) • chem. properties of Residues • basic arithmetics • etc.

Protein Structure Prediction(2) [Muggleton, 1992] • Results • added to background knowledge, then 2nd search • again added to B for the 3rd search alpha0(A,B)  ... position(A,D,O) & not_aromatic(O) & small_or_polar(O) & position(A,B,C) & very_hydrophobic(C) & not_aromatic(C) ...etc (22 literals) alpha1(A,B)  oct(D,E,F,G,B,H,I,J,K) & alpha0(A,F) & alpha0(A,G). alpha2(A,B)  oct(C,D,E,F,B,G,H,I,J) & alpha1(A,B) & alpha1(A,G) & alpha1(A,H).

Protein Structure Prediction(3) [Muggleton, 1992] • Final accuracy on testing set 81% • Best previous result (neural net) 76% • General-purpose bottom-up ILP system Golem used. • Experiment published in the « Protein Engineering » journal.

Mutagenecity Prediction[Srinivasan, 1995] • Predict mutagenecity (carcinogenecity) of chemicals with general system Progol [Muggleton] • Examples: compounds Active Inactive • Result: structural alert

Datamining in Telephony[Zelezny, Stepankova, Zidek 2000] • Discover frequent patterns of operations in an enterprise telephone exchange • Examples: history of calls + related attributes • Result: e.g. rule (lower case ~ constant) covers: • Predicates day, prefix, etc. in background knowledge. redirection(A,B,C,10)  day(tuesday,A) &prefix(C,[5,0],2). redirection([15], [13,14,48], [5,0,0,0,0,0,0,0], 10). redirection([15], [14,18,58], [5,0,9,6,0,1,8,9], 10). redirection([22], [18,50,30], [5,0,0,0,0,0,0,0], 10). redirection([29], [13,35,56], [5,0,0,0,0,0,0,0], 10). redirection([29], [13,57,36], [5,0,0,0,0,0,0,0], 10).

Other Applications • Finite element mesh design • Control of dynamical systems • qualitative simulation • Software Engineering • Many more, especially in data mining

Novel Approaches

Descriptive ILP • Examples are interpretations (models) • is one example • Hypothesis must be true in all examples • Suited for data mining • finds ALL true hypothesis - maximum characterisation triangle(t,up) & circle(c1) & inside(c,t) &circle(c2) & right_of (c2,t) & class(positive) class(positive)  triangle(X,Y) & circle(Z) & inside(Z,X).

Descriptive ILP – Application [Zelezny, Stepankova, Zidek / ILP 2000] • Call logging (mixed events) • Examples of single events(sets of actions and their logs) • Such as t(time(19,43,48),[1,2],time(19,43,48),e,li,empty,d,empty,empty,ex,[0,6,0,2,3,3,0,5,3,3],empty,anstr([0,0,5,0,0,0]),fe,fe,id(4)). t(time(19,43,48),[1,2],time(19,43,50),e,lb,e(relcause),d,dr,06,ex,[0,6,0,0,0,0,0,0,0,0],empty,anstr([0,0,5,0,0,0]),fe,fe,id(5)). ex_ans([0,6,0,2,3,3,0,5,3,3],[1,2]). hangsup([0,6,0,2,3,3,0,5,3,3]).

Descriptive ILP – Application [Zelezny, Stepankova, Zidek / ILP 2000] • Results • Rules that describe actions in terms of logging records • Such as ex_ans(RNCA1,DN1):- t(D1,IT1,DN1,ET1,e,li,empty,d,EF1,FI1,ex,RNCA1,empty,ANTR1,CO1,DE1,ID1), IT2=ET1, ANTR2=ANTR1, t(D2,IT2,DN2,ET2,e,lb,RC2,d,EF2,FI2,ex,RNCA2,empty,ANTR2,CO2,DE2,ID2), samenum(RNCA1,RNCA2).

Upgrades of Propositional Learnes:1st-order Decision Trees • Upgrades the C4.5 algorithm • E.g. Tilde [Blockheel, De Raedt] ? - circle(C1) ? - triangle(T,up) & inside(C1,T) class(positive) ? - circle(C2) & inside(C1,C2) class(positive) class(negative) class(positive)

More Upgrades of Propositional Learners • 1st-order association rules • the WARMR system [Dehaspe] • upgrade of Apriori • 1st-order Bayesian Nets • 1st-order Clustering • 1st-order Distance Based Learning [Zelezny / ILP 2001]

Concluding Remarks • Advantages of ILP • Theoretical: Turing-equivalent expressive power • Practical: rich but understandable language, integration of background knowledge, MULTI-relational data mining • Problems still to be solved... • efficiency, handling numbers, user interfaces

Find out more • ON • ML and DM literature, sources • Our ML and DM group • What we do • How you can participate • Etc. http://cyber.felk.cvut.cz/gerstner/machine-learning

Planten schema paludarium

Planten schema paludarium

Presentation Transcript

Nomenclatuur van planten

Weer nieuwe planten

Schema Schema Integration

Thema 4. Planten.

Themadag Planten

CLASSIFICEREN VAN PLANTEN

Osmose bij planten

CLASSIFICEREN VAN PLANTEN

Stofwisseling in planten

Thema 2 Planten

Planten

Schema Creator schema-creator/

Schema

T2. Planten

T2. Planten

Over Planten

Planten

Planten

Planten - families

Kantoor planten

De planten.

Classificeren van planten