390 likes | 528 Views
Tabu Search for Model Selection in Multiple Regression. Zvi Drezner California State University Fullerton. Optimization. Consider a “convex” optimization Wherever you start, you “slide” into the optimal solution. Non-convexity. Now consider the “non-convex” case.
E N D
Tabu Search for Model Selection in Multiple Regression Zvi Drezner California State University Fullerton
Optimization Consider a “convex” optimization Wherever you start, you “slide” into the optimal solution.
Non-convexity Now consider the “non-convex” case. You may end up with a local optimum.
The Descent Approach Sliding downhill is a descent approach. For maximization it is termed the ascent approach. For continuous functions the descent is done by a gradient search. When the function is non-convex one may end up with a local optimum which is not the best solution.
Tabu Search Tabu search (Glover, 1986) was designed to escape local optima and hopefully reach the global optimum which is the best solution. It starts as a descent approach but once a local optimum is found the search continues. The idea in a nut shell is to allow up moves but no “sliding back” allowed.
Tabu (Contd.) One tries to make the best move downward, but if you just made an up move (because no down move was possible), the best next move may be back down. If we do not forbid back moves the search will enter an infinite loop. The mechanism will be described in detail later.
Multiple Regression Data exist for n variables, m observations. A partial set of the variables may provide a better description of the data. The common criterion used for model fit is the p-value (significance F). Other criteria such as adjusted R-square can be used.
The Issue There are 2n partial subsets. For n=30 there are more than one billion possible subsets. For n=100 there are more than 1030 possible subsets.
Why Tabu? Suppose that you can check a million subsets per second (quite optimistic). Checking all possible subsets for n=30 is still manageable in about 20 minutes. For n=100 it will take more than 1016 (ten million billions) years.
Stepwise Resgression Stepwise regression is “a sort of” a descent algorithm. Suppose that this is the “graph” of the significance F.
The Search The search starts on the left and each addition or a removal of a variables that reduces the significance F is going down the graph. It will end at a local minimum.
A Descent Algorithm Suppose that a subset of variables is selected. We would like to check whether a “move” to another subset can improve the criterion (significance F, adjusted R-square, etc.). A neighborhood of subsets needs to be defined.
We cannot include “all possible subsets” in the neighborhood because we are back to the original issue of checking all possible subsets. Stepwise regression considers adding or removing one variable from the subset. This is a possible neighborhood.
It may be more effective to add to the neighborhood also exchanging two variables, i.e., removing a variable and adding another in one move. It is possible that removing a variable or adding a variable is not beneficial but replacing a variable with a more suitable one is beneficial.
In the Descent algorithm, the best of the moves in the neighborhood is executed until all moves are not improving. The last subset is the result of this approach.
The descent algorithm terminates at a local minimum that may or may not be the global one. The final outcome depends on the starting point.
Intuition • Consider a plane full of craters. • The bottom of one of the craters, the deepest one, is the optimal solution we are looking for. • The descent algorithm starts at a random point in the plane and “slides” into the nearest crater, not necessarily the deepest one.
So, how can you get out of a crater and land possible at a deeper one? One must perform a sequence of upward moves to get out of the crater and hopefully slide into a deeper one. In tabu search the best possible move (whether improving or not) is performed with one stipulation: recent inverse moves are not allowed.
A list of tabu moves is created. The best move which is not in the tabu list is performed. There is one exception to the rule: A move leading to a solution better than the best one found so far in the search is permitted.
The tabu tenure is the number of iterations a tabu move stays in the tabu list. Once a move is performed, the reverse move is entered the tabu list and, if the length of the tabu list exceeds the tabu tenure, the move whose tenure exceeds the tabu tenure is removed from the list.
Other rules for tabu moves can be devised. For example, once a variable is removed from the selected set (either as a removed variable or one of the exchanged variables) it is entered into the tabu list. Entering such a variable back into the selected set is forbidden for the next tabu tenure iterations.
Such a rule for the tabu list prevents cycling. It is possible to have a cycle longer than the tabu tenure but by experience it has never been observed. Determining the value of the tabu tenure can impact the effectiveness of the tabu search procedure.
Common values are between 10% and 20% of the number of possible moves. When the strategy of listing variables recently removed from the selected set, there are at most n possible elements to be included in the tabu list. So, between 10% and 20% of n (but at least 5) is a reasonable choice. If we have fewer than 5 members in the tabu list we run the danger of cycling.
There are many variations researched in the literature about selecting the length of the tabu list (Tabu tenure). One successful strategy is to select the tabu tenure randomly every iteration. This reduces to almost zero the probability of cycling.
Suppose we randomly select the tabu tenure between 10% and 50% of n. For example for n=50 we randomly select the tabu tenure between 5 and 25 every iteration. One iteration the tabu list is 7 variables long and the next one it is 15 variables long, and so on.
Handling the tabu list is done efficiently as follows. For each variable we record the iteration number at which it was taken out from the selected set (or a large negative number when it was never taken out). A variable for consideration to enter the selected set, if the difference between the current iteration number and this record does not exceed the tabu tenure.
A Word of Caution In most cases the Tabu search finds the best (optimal) solution. However, the Tabu search does not guarantee that the best solution is obtained. It is therefore recommended to run the VBA macro several times and select the best solution. Because of the random nature of the procedure you may get different solutions each time the code is run.
The Excel File An Excel file with a program coded in VBA (Visual Basic for Applications) is available. File
Metaheuristic Algorithms • Ascent/Descent • Simulated Annealing • Tabu Search • Genetic Algorithms.
Simulated Annealing • Simulated annealing (Kirkpatrick et al., 1983) simulates the cooling of melted metals. • The algorithm in its simplest form depends on 3 parameters: the starting temperature, the factor by which the temperature is reduced every iteration, and the number of iterations.
Simulated Annealing Algorithm • The temperature T is set to the starting temperature. • A starting solution P is randomly generated. • One move (as in ascent/descent) is randomly selected. • The move is accepted if it is an improving move. • If the value of the objective deteriorates by D by the move, it is accepted with a probability of exp(-D/T) . • The temperature is lowered by the factor. • Stop when the number of iterations is reached.
Intuition • Simulated annealing is like a bouncing rubber ball. • When the temperature is high, the bounce is high. • The bounces become shorter and shorter. • It is easier to get out of a shallow crater and more difficult to get out of a deep one. • We hope that the ball settles at the bottom of the deepest crater.
Genetic Algorithms • Genetic algorithms (Holland, 1975) are borrowed from the natural sciences and Darwin's law of natural selection and survival of the fittest. • Genetic algorithms are based on the premise that, like in nature, successful matching of parents will tend to produce better, improved offspring.
Genetic Algorithms (Cont’d) • A population of solutions is randomly generated. • The following is repeated G generations. • Two parents are selected and merged to produce an offspring. • If the offspring is better than the worst population member, it replaces it. • At the end of the process, the best population member is the solution.
Discussion • There are many fine tuning techniques for the process. • The most important part of genetic algorithms is the merging process to produce the offspring.
The merging process • The following merging process was found very effective. • The two selected parents consist of two lists of variables of lengths p1 and p2. • The two lists have c variables in common. • The union of the two lists has p1+p2-c variables.
Merging process (Cont’d) • Several randomly selected variables (we used 4 variables) not in the union are added to the union. • The c common variables must be included in the offspring. • To create the offspring we select additional variables to these ones. • These are selected by applying a descent/ascent approach to select out of p1+p2-c+4 nodes in the extended union.
Questions? zdrezner@fullerton.edu