100 likes | 200 Views
Testing Heuristics: We Have It All Wrong. J. N. Hooker (1995) Presented to EARG: davet.2008.10.22. Abstract / Summary. Comparing 2 algorithms and using realistic test problems is hard Answers question of faster but not why More scientific approach is needed
E N D
Testing Heuristics:We Have It All Wrong J. N. Hooker (1995)Presented to EARG: davet.2008.10.22
Abstract / Summary • Comparing 2 algorithms and using realistic test problems is hard • Answers question of faster but not why • More scientific approach is needed • Confuse R&D… testing is only suitable for the D
Introduction • New algorithm: an algorithmic race determines thefate and fame • Emphasis on competition is anti-intellectual and does not build insight for the long run • The richest observations are often informal • Competition diverts time & resources from investigation
Alternative? • Instead of competietion, controlled experimentation • For example: Find algorithm ‘Characteristic’ • Design experiments to see how presence / absence of this characteristic affects performance • Ideally build a mathematical model that predicts behaviour and then test experimentally
Evils of Competitive Testing • Life’s not fair • Implementation • Coding skill, (parameter) tuning • ‘vanilla’ paradox • Test problem selection • Randomly generated pitfalls • Selective advantage when introduced alongside algorithms • Biased evolution / tail wags the dog • No such thing as a representative problem set
Insight-less • Kitchen sink algorithms • Informative testing occurs at design-stage • Too much time on ‘code optimization’
A More Scientific Alternative • Efficient code is important, but more preliminary work required: ‘Bridge Competitions’ • SAT DPLL Branching case study • Need Feature Isolating Constructed benchmarks
What to Measure • Solution Quality vs. Running Time • Attempt to decouple • References McGeoch • Measure only what a model predicts • Flip the paradigm: • (Page 10, 2ndpara.) • Code is the phenomenonAlgorithm is a simplified model of the phenomenon (code) • Running time is immaterial w.r.t. the real phenomenon • Subroutine calls » subroutine details » data structures
Benefits of Scientific Testing • Irrelevent: (sic) • Machine speed • Data structures* • Coding Skill • Algorithm tuning • Establishment of existing algorithm (implementations) • Remove reliance on benchmark problems: • Concoct problem sets specifically atypical
Research vs. Development • Benchmark Suites good for ‘development’ but controlled experimentation is needed for ‘research’ • Evaluate research on contribution to understanding, not advancing the ‘state-of-the-art’