Learning from relational databases using recurrent neural networks

Learning from relational databases using recurrent neural networks Werner Uwents Hendrik Blockeel Katholieke Universiteit Leuven, Belgium

Disclaimer • I’m not an expert in neural networks… • This talk expresses a “relational learner’s’’ point of view • Will discuss: • Learning from relational databases, as opposed to propositional learning • Need for constructing features in the form of aggregate functions • How to do this with neural networks • Some experimental results

Propositional versus relational learning • Attribute-value (“propositional”) learning setting: • 1 data element = 1 vector • In RDB: one tuple • This is the setting most learning/mining systems use R A1 A2 A3 A4 A5 A6 Learn f such that f(A1,…,A5) = A6

Propositional versus relational learning • Relational learning setting: • 1 data element = 1 set of [sets of]* vectors (and vectors may be of different types) • In RDB: one tuple linked to any number of other tuples • How to deal with this when learning • symbolic expressions? • neural networks? S R B1 B2 B3 B4 A1 A2 A3 A4 A5 A6 Learn f such that f(A1,…,A5, {(B11,B12,B13,B14), (B21,B22,B23,B24), …}) = A6

Propositional versus relational learning • Relational learning setting: • 1 data element = 1 set of [sets of]* vectors (and vectors may be of different types) • In RDB: one tuple linked to any number of other tuples • How to deal with this when learning • symbolic expressions? • neural networks? T S R C1 C2 B1 B2 B3 B4 A1 A2 A3 A4 A5 A6 Learn f such that f(A1,…,A5,B) = A6 with B={(Bi1,Bi2,Bi3,Bi4,Ci)} with Ci={(Cij1, Cij2)}

Relational algebra expressions • In relational algebra, looking for models containing expressions of the form A(C(R)) • With A an aggregation function (count, sum, avg, max, min, …) • And C a selection function (C a selection condition) over a table R • Need to learn combinations of A and C • Little explored up till now • In ILP: A = , C can be complex • Approaches with predefined aggregation functions: A = count/avg/sum/…, but typically C is trivial

Example of A(C(R)) • If you have children, you get some money for raising them… • Money received depends on • Number of children? • More precise: Number of children that live with parents and do not have an own income • Until recently, problematic/impossible to learn such (simple) rules

Complexity of A versus C Relational neural nets? (subsymbolic C+A) Complex A Language bias containing aggregate functions (avg, …) Complex aggregates in ILP (Vens et al., 2003-2007) Inductive logic programming Trivial A Trivial C Complex C

Relational Neural Networks(Uwents & Blockeel, ILP 2005) • Need to process an indefinite number of tuples per prediction (those in the sets) • Therefore: use recurrent neural networks • Feed all tuples one by one, accumulate the information in them • ANN can express selection conditions on single tuples • Recurrent ANN can (on top of that) implement an aggregate function • Hypothesis: we can learn recurrent ANNs that express models containing A(C(T)) expressions

Structure of relational neural networks • Can follow structure of RDB schema R S 1 N

… … … Structure of relational neural networks • Can follow structure of RDB schema R S 1 N S R B1 B2 A1 A2 A3 2 2 2 2 x … 1 5 1 5 3 3 4 3 4

Structure of relational neural networks • Recursively apply the same principle… R S 1 N M N T

Relational neural networks • Some tricks & alternatives: • Choice of different architectures for recurrent neural nets • For complete 1-1 and N-1 relations, related tuples can simply be joined with target tuple • Need to indicate empty sets • One extra boolean input • Sets are not order dependent: randomize order (reshuffle tuples in sets during training)

Experimental questions • 1) How well can we learn the aggregates themselves • 2) How well can we learn models containing features of the form A(C(T)) • 3) How much does predictive performance on some real problems improve? • Not always clear, for a real problem, whether features of the proposed form are relevant

Some datasets • Artificial datasets: “trains” • Variants of “eastbound westbound” problems • Simple concept: “has at least two circle loads” • Complex concept: “has more than seven wheels and not more than 1 open car with rectangular load OR has more than one circle load” • Real datasets: • Mutagenesis (set of molecular structures) • Diterpenes (mass spectroscopy data)

Some experimental results • Compared to the symbolic approach: • Sometimes better (“Mutagenesis” data set) • Sometimes worse (“Diterpenes”, “Trains”) • For trains (artificial dataset): • RNN less good than symbolic approach for “simple” target • RNN equally good as symbolic approach for “complex” target • Experiments on artificial datasets confirm expectations • Functions that can be expressed as relatively simple expressions in rel.alg. are easier to learn with symbolic learners • Experiments on “real” datasets • confirm that RNN approach can be competitive • But no clear view, currently, on when they work better/worse, except for what’s confirmed by the artificial data experiments

Connection to “graph neural networks” (collab. Siena-Leuven, submitted) • Work by Marco Gori et al. on graph neural networks • Point of departure: learning in graphs • Connection to learning in RDB is clear: linked tuples are “neighbors” in graph • Comparison RNNs - GNNs: • RNNs naturally allow for different “types” of nodes, weights depending on these types • Different learning strategy • Experimental results: • Learning just the aggregate function: some functions easier to learn with GNNs, other with RNNs • Real datasets: GNNs perform better or equally good

Ongoing work • A more extensive comparison between • recurrent nets, • nets with variable sized input vectors and weight sharing, • and GNN training method • More difficult aggregate functions • Not only the standard avg, sum, count, … but also sums of products of two attributes, etc. • Results not yet available…

To conclude • Learning models with features that combine aggregation and selection is an important problem • Some results obtained in symbolic learning • Subsymbolic might work even better in some cases • Some results obtained with (recurrent) neural networks

Learning from relational databases using recurrent neural networks

Learning from relational databases using recurrent neural networks

Presentation Transcript

Recurrent neural networks (I)

Speech Sound Production: Recognition Using Recurrent Neural Networks

Optimization-Neural Networks Learning from Data

III. Recurrent Neural Networks

Using Relational Databases and SQL

Using Relational Databases and SQL

Learning in Recurrent Networks

Recurrent Neural Networks or Associative Memories

Generating Text with Recurrent Neural Networks

Learning in Recurrent Networks

Project 1: Machine Learning Using Neural Networks

RECURRENT NEURAL NETWORKS

Using Relational Databases and SQL

Using Relational Databases and SQL

CSC2535 2013: Advanced Machine Learning Lecture 10 Recurrent neural networks

Using Relational Databases

Semantic Integration in Heterogeneous Databases Using Neural Networks

Using Relational Databases and SQL

Recurrent Neural Networks

Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks