170 likes | 310 Views
Experiment Databases: Towards better experimental research in machine learning and data mining. Hendrik Blockeel Katholieke Universiteit Leuven. Motivation. Much research in ML / DM involves experimental evaluation Interpreting results is more difficult than it may seem
E N D
Experiment Databases:Towards better experimental research in machine learning and data mining Hendrik Blockeel Katholieke Universiteit Leuven
Motivation • Much research in ML / DM involves experimental evaluation • Interpreting results is more difficult than it may seem • Typically, a few specificimplementations of algorithms, with specificparametersettings, are compared on afewdatasets, and then general conclusions are drawn • How generalizable are these results really? • Evidence exists that too general conclusions are often drawn • E.g., Perlich & Provost: different relative performance of techniques depending on size of dataset
Very sparse evidence Algorithm parameter space (AP) x x A few points in an N-dim space, where N is very large: very sparse evidence! x Dataset space (DS)
An improved methodology • We here argue in favour of an improved experimental methodology: • Perform much more experiments • Better coverage of algorithm – dataset space • Store results in an “experiment database” • Better reproducability • Mine that database for patterns • More advanced analysis possible • The approach shares characteristics of inductive databases: • The database will be mined for specific kinds of patterns: inductive queries, constraint based mining
Classical setup of experiments • Currently, performance evaluations of algorithms rely on few specific instantiationsofalgorithms (implementations, parameters), tested on few datasets (with specific properties), often focusing on specific evaluationcriteria, and with a specific research question in mind • Disadvantages: • Limited generalisability (see before) • Limited reusability of experiments • If we want to test another hypothesis, we need to run new experiments, with a different setup, and now recording other information
Setup of an experiment database • The ExpDB is filled with results from random instantiations of algorithms, on random datasets • Algorithm parameters, dataset properties are recorded • Performance criteria are measured and stored • These experiments cover the whole DS x AP-space Choose alg. Choose param Generate dataset Run Store Alg. par., dataset prop., results CART C4.5 Ripper ... Leaf size > 2 Heuristic = gain ... #examples=1000 #attr=20 ...
Setup of an experiment database • When experimenting with 1 learner, e.g., C4.5: Algorithm parameters Dataset characteristics Performance MLS heur ... Ex Attr Compl ... TP FP RT ... 2 gain ... 1000 20 17 ... 350 65 17 ...
Setup of an experiment database • When experimenting with multiple learners: • More complicated setting, will not be considered here ExpDB Alg. Inst. PI Ex Attr Compl ... TP FP RT ... DT C4.5 C45-1 1000 20 17 ... 1000 20 17 ... DT CART CA-1 2000 50 12 ... 1000 20 17 ... C4.5ParInst CART-ParInst PI MLS heur ... PI BS heur ... C45-1 2 gain ... CA-1 yes Gini ...
Experimental questions and hypotheses • Example questions: • What is the effect of Parameter X on runtime ? • What is the effect of the number of examples in the dataset on TP and FP? • .... • With classical methodology: • Different sets of experiments needed for each • (Unless all questions known in advance, and experiments designed in order to answer all of them) • ExpDB approach: • Just query the ExpDB table for the answer • New question = 1 new query, not new experiments
Inductive querying • To find the right patterns in the ExpDB, we need a suitable query language • Many queries can be answered with standard SQL, but (probably) not all (easily) • We illustrate this with some simple examples
Investigating a simple effect • The effect of #Items on Runtime for frequent itemset algorithms Runtime SELECT NItems, Runtime FROM ExpDB SORT BY NItems x x x x x x x x x x SELECT NItems, AVG Runtime FROM ExpDB GROUP BY NItems SORT BY NItems NItems
Investigating a simple effect • Note: • Setting all parameters randomly creates more variance in the results • In the classical approach, these other parameters would simply be kept constant • This leads to clearer, but possibly less generalisable results • This can be simulated easily in the ExpDB setting! • + : condition is explicit in the query • - : we use only a part of the ExpDB • So, ExpDB needs to have many experiments SELECT NItems, Runtime FROM ExpDB WHERE MinSupport=0.05 SORT BY NItems
Investigating interaction of effects • E.g., does effect of NItems on Runtime change with MinSupport and NTrans? FOR a=0.01, 0.02, 0.05, 0.1 DO FOR b=103,104, 105,106,107 DO PLOT SELECT Nitems, Runtime FROM ExpDB WHERE MinSupport=$a AND $b <= NTrans < 10*$b SORT BY NITems
Direct questions instead of repeated hypothesis testing (“true” data mining) • What is the algorithm parameter that has the strongest influence on the runtime of my decision tree learner? SELECT ParName, Var(A)/Avg(V) as Effect FROM AlgorithmParameters, (SELECT $ParName, Var(Runtime) as V, Avg(Runtime) as A FROM ExpDB GROUP BY $ParName) GROUP BY ParName SORT BY Effect Not (easily) expressible in standard SQL ! (pivoting: possible by hardcoding all attribute names in the query: not very readable or reusable)
A comparison Classical approach ExpDB approach 1) Experiments are goal-oriented 2) Experiments seem more convincing than they are 3) Need to do new experiments when new research questions pop up 4) Conditions under which results are valid are unclear 5) Relatively simple analysis of results 6) Mostly repeated hypothesis testing, rather than direct questions 7) Low reusability and reproducibility 1) Experiments are general-purpose 2) Experiments seem as convincing as they are 3) No new experiments needed when new research questions pop up 4) Conditions under which results are valid are explicit in the query 5) Sophisticated analysis of results possible 6) Direct questions possible, given suitable inductive query languages 7) Better reusability and reproducibility
Summary • ExpDB approach • Is more efficient • The same set of experiments is reusable and reused • Is more precise and thrustworthy • Conditions under which the conclusions hold are explicitly stated • Yields better documented experiments • Precise information on all experiments is kept, experiments are reproducible • Allows more sophisticatedanalysis of results • Interaction of effects, true data mining capacity • Note: interesting for meta-learning!
The challenges... (*) • Good dataset generators necessary • Generating truly varying datasets is not easy • Could start from real-life datasets (build variations) • Extensive descriptions of datasets and algorithms • Vary as many possibly relevant properties as possible • Database schema for multi-algorithm ExpDB • Suitable inductive query languages (*) note: even without solving all these problems, some improvement over the current situation is feasible and easy to achieve