220 likes | 320 Views
Towards prediction of algorithm performance in real world problems. Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe. * tommy.messelis@kuleuven-kortrijk.be. Overview. scope performance prediction two approaches experimental setup & results
E N D
Towards prediction of algorithm performance in real world problems Tommy Messelis* Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *tommy.messelis@kuleuven-kortrijk.be
Overview • scope • performance prediction • two approaches • experimental setup & results • conclusions • future work
Scope Real world timetabling and personnel scheduling problems example: Nurse Rostering Problem (NRP) • assign nurses to shifts • for some planning horizon • pre-determined demands • satisfying • hard constraints: e.g. no nurse can work two places at the same time • soft constraints: e.g. a nurse should not work more than 7 days in a row • objective: Find an assignment that satisfies all hard constraints and as many soft constraints as possible. NP-hard combinatorial optimisation problem
Scope • in practice, finding the optimal solution is computationally infeasible, even for small real world problems • rely on good-enough, fast-enough solutions, provided by (meta)heuristics
Performance prediction • There are many solution methods • some perform very well, while others very bad on the same instances • no method outperforms the others on all instances • It would be good to know in advance how well an algorithm will perform on a given problem instance • choose the best algorithm and use the resources as efficiently as possible • decisions should be made without spending the (possibly scarce) resources • based on basic, quickly computable properties of the instance itself
Empirical hardness models • idea of mapping efficiently computable features onto empirical hardness measures • empirical • we need to run some algorithm to get an idea of the hardness of an instance • hardness • is measured by some performance criteria of the algorithm
General framework Introduced by Leyton-Brown et al. • identify a problem instance distribution • select one or more algorithms • select a set of inexpensive features • generate a training set of instances, run all algorithms and determine runtimes, compute all feature values for all instances • eliminate redundant/uninformative features • build empirical hardness models (functions of the features that predict the runtime) K. Leyton-Brown, E. Nudelman, Y. Shoham. Learning the empirical hardness of optimisation problems: The case of combinatorial auctions. In LNCS, 2002 K. Leyton-Brown, E. Nudelman, Y. Shoham. Empirical Hardness Models: Methodology and a case study on combinatorial auctions. In Journal of the ACM, 2009
Performance prediction • We will use this framework to predict other performance criteria as well • quality of the solutions • quality gap (distance between the found solution and the optimal solution) • for both a complete solver and a metaheuristic • proof-of-concept study, on a small ‘real world’-like instance distribution
Approaches • NRP-specific context • feature set for nurse rostering problems • build empirical hardness models based on these features • General SAT context • translate the NRP instances into SAT instances • use an existing extensive feature set for SAT problems • build empirical hardness models based on SAT features
Experimental setup • Instance distribution • we focus at an instance distribution that produces small instances, still solvable by a complete solver in a reasonable amount of time • 6 nurses, 14 days, fluctuating sequence constraints, coverage and availabilities • Algorithm selection + performance criteria • integer program representation, CPLEX solver • runtime • quality of the optimal solution • variable neighbourhood search (metaheuristic) • quality of the approximate solution • quality gap between optimal and approximate solution
Experimental setup • Feature set • NRP features • structural property values: • min & max total nr of assigments • min & max nr of consecutive working days • min & max nr of consecutive free days • ratios, expressing the ‘tightness’ of the constraints • hardness: availability / coverage demand • tightness: max / min ratios (the smaller, the less freedom)
Experimental setup • Feature set After translation to a SAT formula, we can use (a subset of) an existing feature set for SAT problems • SAT features • problem size: #clauses, #variables, c/v ratio, ... • problem structure: different graph representations lead to various node and edge degree statistics • balance bases: fraction pos. literals per clause, positive occurences per variable, ... • proximity to Horn formulae
Experimental setup • Sampling and measuring • 2 training sets: 500 instances & 2000 instances • for computational reasons, the performance criteria that are dependent on CPLEX results are modelled using the smaller set • both algorithms are run on the training instances, all performance criteria are determined, all feature values are computed. • Feature elimination • useless features (i.e. univalued) • correlated features (based on correlation analysis) • if (correlation coefficient > 0.7) then filter feature
Experimental setup • Model learning • statistical regression techniques • linear regression • relatively simple technique, but with good results • models for all performance criteria • based on NRP features • based on SAT features • models are built iteratively • start with a model consisting of all uncorrelated features • features are iteratively removed from the regression model when their P-value is higher than 0.05 • evaluation using testsets of 100 and 1000 instances
Results • CPLEX runtime • models are not very accurate (R2 = 0.10) • due to high variability in the runtimes • most instances (70%) are solved in 4 ms • some take up to 4 hours • models for log(runtime) are ‘better’, but still not very accurate (R2 = 0.17)
Results • quality optimal solution R2 = 0.98 R2 = 0.94
Results • quality approximate solution R2 = 0.96 R2 = 0.94
Results • quality gap between approximate and optimal solution R2 = 0.54 R2 = 0.40
Conclusions • We obtain good results for predicting solution quality • complete CPLEX solver • approximate metaheuristic • quality gap prediction is less accurate, though still not bad • CPLEX runtime could not be modelled • but is this really necessary?
Conclusions • importance of the translation to SAT • representing the instances as general, abstract SAT problems helps • expressing hidden structures • context free modeling • SAT models are slightly better than NRP models • due to the quantity/quality of NRP features • though, still very good results, even with this limited NRP feature set! • We can build empirical hardness models!
Future work • ‘real’ real world instances • improve the runtime prediction, use other features better suited for runtime prediction • e.g. based on solving relaxations of integer programs • for some performance criteria, other machine learning techniques might be more suitable • e.g. classification tool for runtime prediction • very short / feasible / unfeasible • better systematic building of models • instead of manual deletion of features in the iterative models
Thank you! Questions?