200 likes | 362 Views
Learning from relational databases using recurrent neural networks. Werner Uwents Hendrik Blockeel Katholieke Universiteit Leuven, Belgium. Disclaimer. I’m not an expert in neural networks… This talk expresses a “relational learner’s’’ point of view Will discuss:
E N D
Learning from relational databases using recurrent neural networks Werner Uwents Hendrik Blockeel Katholieke Universiteit Leuven, Belgium
Disclaimer • I’m not an expert in neural networks… • This talk expresses a “relational learner’s’’ point of view • Will discuss: • Learning from relational databases, as opposed to propositional learning • Need for constructing features in the form of aggregate functions • How to do this with neural networks • Some experimental results
Propositional versus relational learning • Attribute-value (“propositional”) learning setting: • 1 data element = 1 vector • In RDB: one tuple • This is the setting most learning/mining systems use R A1 A2 A3 A4 A5 A6 Learn f such that f(A1,…,A5) = A6
Propositional versus relational learning • Relational learning setting: • 1 data element = 1 set of [sets of]* vectors (and vectors may be of different types) • In RDB: one tuple linked to any number of other tuples • How to deal with this when learning • symbolic expressions? • neural networks? S R B1 B2 B3 B4 A1 A2 A3 A4 A5 A6 Learn f such that f(A1,…,A5, {(B11,B12,B13,B14), (B21,B22,B23,B24), …}) = A6
Propositional versus relational learning • Relational learning setting: • 1 data element = 1 set of [sets of]* vectors (and vectors may be of different types) • In RDB: one tuple linked to any number of other tuples • How to deal with this when learning • symbolic expressions? • neural networks? T S R C1 C2 B1 B2 B3 B4 A1 A2 A3 A4 A5 A6 Learn f such that f(A1,…,A5,B) = A6 with B={(Bi1,Bi2,Bi3,Bi4,Ci)} with Ci={(Cij1, Cij2)}
Relational algebra expressions • In relational algebra, looking for models containing expressions of the form A(C(R)) • With A an aggregation function (count, sum, avg, max, min, …) • And C a selection function (C a selection condition) over a table R • Need to learn combinations of A and C • Little explored up till now • In ILP: A = , C can be complex • Approaches with predefined aggregation functions: A = count/avg/sum/…, but typically C is trivial
Example of A(C(R)) • If you have children, you get some money for raising them… • Money received depends on • Number of children? • More precise: Number of children that live with parents and do not have an own income • Until recently, problematic/impossible to learn such (simple) rules
Complexity of A versus C Relational neural nets? (subsymbolic C+A) Complex A Language bias containing aggregate functions (avg, …) Complex aggregates in ILP (Vens et al., 2003-2007) Inductive logic programming Trivial A Trivial C Complex C
Relational Neural Networks(Uwents & Blockeel, ILP 2005) • Need to process an indefinite number of tuples per prediction (those in the sets) • Therefore: use recurrent neural networks • Feed all tuples one by one, accumulate the information in them • ANN can express selection conditions on single tuples • Recurrent ANN can (on top of that) implement an aggregate function • Hypothesis: we can learn recurrent ANNs that express models containing A(C(T)) expressions
Structure of relational neural networks • Can follow structure of RDB schema R S 1 N
… … … Structure of relational neural networks • Can follow structure of RDB schema R S 1 N S R B1 B2 A1 A2 A3 2 2 2 2 x … 1 5 1 5 3 3 4 3 4
Structure of relational neural networks • Recursively apply the same principle… R S 1 N M N T
Relational neural networks • Some tricks & alternatives: • Choice of different architectures for recurrent neural nets • For complete 1-1 and N-1 relations, related tuples can simply be joined with target tuple • Need to indicate empty sets • One extra boolean input • Sets are not order dependent: randomize order (reshuffle tuples in sets during training)
Experimental questions • 1) How well can we learn the aggregates themselves • 2) How well can we learn models containing features of the form A(C(T)) • 3) How much does predictive performance on some real problems improve? • Not always clear, for a real problem, whether features of the proposed form are relevant
Some datasets • Artificial datasets: “trains” • Variants of “eastbound westbound” problems • Simple concept: “has at least two circle loads” • Complex concept: “has more than seven wheels and not more than 1 open car with rectangular load OR has more than one circle load” • Real datasets: • Mutagenesis (set of molecular structures) • Diterpenes (mass spectroscopy data)
Some experimental results • Compared to the symbolic approach: • Sometimes better (“Mutagenesis” data set) • Sometimes worse (“Diterpenes”, “Trains”) • For trains (artificial dataset): • RNN less good than symbolic approach for “simple” target • RNN equally good as symbolic approach for “complex” target • Experiments on artificial datasets confirm expectations • Functions that can be expressed as relatively simple expressions in rel.alg. are easier to learn with symbolic learners • Experiments on “real” datasets • confirm that RNN approach can be competitive • But no clear view, currently, on when they work better/worse, except for what’s confirmed by the artificial data experiments
Connection to “graph neural networks” (collab. Siena-Leuven, submitted) • Work by Marco Gori et al. on graph neural networks • Point of departure: learning in graphs • Connection to learning in RDB is clear: linked tuples are “neighbors” in graph • Comparison RNNs - GNNs: • RNNs naturally allow for different “types” of nodes, weights depending on these types • Different learning strategy • Experimental results: • Learning just the aggregate function: some functions easier to learn with GNNs, other with RNNs • Real datasets: GNNs perform better or equally good
Ongoing work • A more extensive comparison between • recurrent nets, • nets with variable sized input vectors and weight sharing, • and GNN training method • More difficult aggregate functions • Not only the standard avg, sum, count, … but also sums of products of two attributes, etc. • Results not yet available…
To conclude • Learning models with features that combine aggregation and selection is an important problem • Some results obtained in symbolic learning • Subsymbolic might work even better in some cases • Some results obtained with (recurrent) neural networks