250 likes | 733 Views
No Free Lunch (NFL) Theorem. Presentation by Kristian Nolde. Many slides are based on a presentation of Y.C. Ho. General notes. Goal: Give an intuitive feeling for the NFL Present some mathemtical background To keep in mind NFL is an impossibility theorem, such as
E N D
No Free Lunch (NFL)Theorem Presentation by Kristian Nolde Many slides are based on a presentation of Y.C. Ho
General notes Goal: • Give an intuitive feeling for the NFL • Present some mathemtical background To keep in mind • NFL is an impossibility theorem, such as • Gödel‘s proof in mathematics (roughly: some facts cannot be proved or disaproved in any mathematical system) • Arrow‘s theorem in economics (in principle, perfect democracy is not realizable) • Thus, practicle use is limited ?!?
The No Free Lunch Theorem • Without specific structural assumptions, no optimization scheme can perform better than blind search on the average • But blind search is very inefficient! • Prob (at least one out of N samples is in the top-n for search space of size |Q|) ~ nN/|Q| ex. Prob=0.0001 for |Q|=109, n=1000, N=1000
Assume a finite World Finite # of input symbols (x’s) and finite # of output symbols (y’s) => finite # of possible mappings from input to output (f’s)
f1 f2 f|F| 0 1 1 x1 0 1 1 1 0 x2 1 0 0 0 1 0 x|X| 1 1 The Fundamental Matrix F In each row, each value of Y appear |Y| |X|-1 times! FACT: equal number of 0’s and 1’s in each row! Averaged over all f, the value is independent of x!
Compare Algorithms • Think of two algorithms: a1 and a2e.g. a1 always selects from x1 to x.5|X| a2 always selects from x.5|X| to x|X| • For specific f: a1 or a2 may be bettter. However, if f is not known average performance of both is equal: where d is a sample and dy is the cot value associated with d.
Comparing Algorithms Continued • Case 1: Algorithms can be more specific, e.g. assume a certain realization fk, a1 • Case 2: Or, they can be more general, assume more uniform distribution of possible f, a2. • Then performance of a1 will be excellent for fk but catastrophic for all other cases (great performance, no robustness) • Contrary, a2 performs mediocre for all cases, but doesn‘t fail (poor performance, high robustness) Common Sense says: Robustness * Efficiency = Constant or Generality * Depth = Constant
Implication 1 • Let x be the optimization variable, f the performance function, and y the performance, i.e., y=f(x) • then averaged over all possible optimization problems, the result is choice independent • if you don’t know the structure of f (which column you are dealing with), blind choice is as good as any!
Implications 2 • Let X be the space of all possible representation (as in genetic algorithms), or space of all possible algorithms to apply to a class of problems • Without understanding of the problem, blind choice is as good as any. • “understanding” means you know which column of the F matrix you are dealing with
Implications 3 • Even if you know which columns or group of columns you are dealing with => you can specialize the choice of rows • You must accept that you will suffer LOSSES should other choices of column occur due to uncertainties or disturbances
f1 f2 f|F| 0 1 1 x1 0 1 1 1 0 x2 1 0 0 0 1 0 x|X| 1 1 The Fundamental Matrix F Assume a distribution of the columns, then pick a row that results in minimal expected losses or maximal performance. This is stochastic optimization
Implications 5 • Worse, if you should estimate the probabilities incorrectly, then your stochastically optimized solution may suffer catastrophic bad outcomes more frequent then you like. • Reason: you have already used up more of the good outcomes in your “optimal” choice. What are left are bad ones that are not suppose to occur! (HOT Design & power law -Doyle)
Implications 6 • Generality for generality sake is not very fruitful • Working on a specific problem can be rewarding • Because: • the insight can be generalized • the problem is practically important • the 80-20 effect