Testing Heuristics: We Have It All Wrong

Testing Heuristics:We Have It All Wrong J. N. Hooker (1995)Presented to EARG: davet.2008.10.22

Abstract / Summary • Comparing 2 algorithms and using realistic test problems is hard • Answers question of faster but not why • More scientific approach is needed • Confuse R&D… testing is only suitable for the D

Introduction • New algorithm: an algorithmic race determines thefate and fame • Emphasis on competition is anti-intellectual and does not build insight for the long run • The richest observations are often informal • Competition diverts time & resources from investigation

Alternative? • Instead of competietion, controlled experimentation • For example: Find algorithm ‘Characteristic’ • Design experiments to see how presence / absence of this characteristic affects performance • Ideally build a mathematical model that predicts behaviour and then test experimentally

Evils of Competitive Testing • Life’s not fair • Implementation • Coding skill, (parameter) tuning • ‘vanilla’ paradox • Test problem selection • Randomly generated pitfalls • Selective advantage when introduced alongside algorithms • Biased evolution / tail wags the dog • No such thing as a representative problem set

Insight-less • Kitchen sink algorithms • Informative testing occurs at design-stage • Too much time on ‘code optimization’

A More Scientific Alternative • Efficient code is important, but more preliminary work required: ‘Bridge Competitions’ • SAT DPLL Branching case study • Need Feature Isolating Constructed benchmarks

What to Measure • Solution Quality vs. Running Time • Attempt to decouple • References McGeoch • Measure only what a model predicts • Flip the paradigm: • (Page 10, 2ndpara.) • Code is the phenomenonAlgorithm is a simplified model of the phenomenon (code) • Running time is immaterial w.r.t. the real phenomenon • Subroutine calls » subroutine details » data structures

Benefits of Scientific Testing • Irrelevent: (sic) • Machine speed • Data structures* • Coding Skill • Algorithm tuning • Establishment of existing algorithm (implementations) • Remove reliance on benchmark problems: • Concoct problem sets specifically atypical

Research vs. Development • Benchmark Suites good for ‘development’ but controlled experimentation is needed for ‘research’ • Evaluate research on contribution to understanding, not advancing the ‘state-of-the-art’

Testing Heuristics: We Have It All Wrong