Anna Atramentov and Vasant Honavar* Artificial Intelligence Laboratory

Speeding Up Multi-Relational Data Mining Anna Atramentov and Vasant Honavar* Artificial Intelligence Laboratory Department of Computer Science Iowa State University Ames, IA 50011, USA www.cs.iastate.edu/~honavar/aigroup.html * Support provided in part by National Science Foundation, Carver Foundation, and Pioneer Hi-Bred, Inc.

Motivation Importance of relational learning: • Growth of data stored in MRDB • Techniques for learning unstructured data often extract the data into MRDB One of the promising approaches to relational learning: • MRDM (Multi-Relational Data Mining) framework developed by Knobbe et. al. (1999) • MRDTL (Multi-Relational Decision Tree Learning) algorithm implemented by Leiva et. al. (2002) Goal • Speed up MRDM framework and in particular MRDTL algorithm

Problem Formulation Given: Data stored in relational database Goal: Learn a predictive model for the instances in the target table Example of multi-relational database schema instances

Grad.Student GPA >3.9 MRDM overview. Selection graphs Grad.Student Department • Nodes correspond to the tables from the database • Edges correspond to the associations between tables • It corresponds to the subset of the instances from the target table having some property • It is a way of specifying attributes in the relational setting Staff Specialization=math Staff

MRDM overview. Transforming selection graphs into SQL queries Select distinctT0.id FromStaff T0, Graduate_Student T1 Where T0.id=T1.Advisor Staff Grad. Student Generic query: select distinctT0.primary_key fromtable_list wherejoin_list andcondition_list Staff Grad. Student SelectdistinctT0.id FromStaff T0 Where T0.id not in ( Select T1. id From Graduate_Student T1) Grad. Student Select distinct T0. id From Staff T0, Graduate_Student T1 WhereT0.id=T1.Advisor T0. id not in ( Select T1. id From Graduate_Student T1 Where T1.GPA > 3.9) Staff Grad. Student GPA >3.9

Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student Grad.Student Department Staff GPA >3.9 Grad.Student GPA >3.9 Grad.Student GPA>2.0 MRDM overview. Refinements of selection graphs refinement GPA >2.0 Specialization=math Specialization=math complement refinement Specialization=math

Grad.Student Department Staff Grad.Student GPA >3.9 The most time consuming operations of MRDTL Query associated with the selection graph: Specialization=math select distinct Staff.Salary, count(distinct Staff.ID) fromStaff, Grad.Student, Department wherejoin_list andcondition_list group by Staff.Salary

Grad.Student Department Staff Grad.Student GPA >3.9 A way to speed up - eliminate redundant calculations Problem:For selection graph with 160 nodes the time to execute a query is more than 3 minutes! Redundancy in calculation:Tables Staff and Grad.Student will be joined for all the children refinements A way to fix:make the join only once and save necessary information for all further calculations Specialization=math

Grad.Student Department Staff Grad.Student GPA >3.9 Speed Up Method. Sufficient tables Specialization=math

Grad.Student Department Staff Grad.Student GPA >3.9 Speed Up Method. Sufficient tables Specialization=math Query associated with the selection graph: selectS.Salary, count(distinct S.Staff_ID) fromS group by S.Salary

Experimental results

Summary • A general approach for speeding up MRDM framework • MRDTL algorithm is a competitive algorithm for learning from RDB in terms of both accuracy and time Future work • techniques for handling missing values • pruning techniques or complexity regularizations • use of the aggregates for the attribute values • more extensive evaluation of MRDTL on real-world data sets

Anna Atramentov and Vasant Honavar* Artificial Intelligence Laboratory

Anna Atramentov and Vasant Honavar* Artificial Intelligence Laboratory

Presentation Transcript

Artificial Intelligence

Lectures on Artificial Intelligence – CS364 Introduction to Uncertainty Management

CS B551: Elements of Artificial Intelligence

Technical Issues: Artificial Intelligence

CS 541: Artificial Intelligence

Artificial Intelligence The different levels of language analysis

CptS 440 / 540 Artificial Intelligence

Artificial Intelligence: Planning

Artificial Intelligence Technologies for Web Intelligence

Tutorial: Recommender Systems International Joint Conference on Artificial Intelligence Barcelona, July 17, 2011

CS 541: Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence

Artificial Intelligence and Software that Learns and Evolves

CS347 – Introduction to Artificial Intelligence

DCP 1172 Introduction to Artificial Intelligence

Searching in the Right Space

Game Programming (Game AI Technologies)

Artificial Life

Artificial Intelligence