580 likes | 599 Views
Explore using Genetic Programming to evolve efficient list search algorithms in Algorithm Design through Darwinian Evolution. Learn about the evolutionary setup, representation, and results of applying evolutionary computation to search algorithm design.
E N D
Evolving EfficientList Search Algorithms Kfir Wolfson Moshe Sipper Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Introduction • Algorithm design is important task in CS • Evolutionary algorithms have been applied to many areas, but limited research on software engineering and algorithmic design • We introduce the notion “Algorithmic design through Darwinian evolution” • Begin with a benchmark case – List Search Algorithms: • Can evolution be applied to finding a search algorithm? • Can evolution be applied to finding an efficient search algorithm? • We employ Genetic Programming (GP) to the task and show the answer to both questions is affirmative Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Evolutionary Setup • Representation • Phenotype • Genotype • GP Parameters • Fitness Function • GP Operators Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation • Phenotype • Array search algorithm • Searches for a key in a 1-dimentional array Java static function: publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
global variables • Set to: • n for linear search • log2 n for sublinear Representation publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Array index returned (might be “illegal”) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
If INDEX:= = NOP Array [INDEX] KEY ITER Representation • Genotype • Koza-style genetic programming • Evaluation trees • Strongly typed • More understandable algorithms • Function and Terminal sets • Same for evolution of both linear andsublinear search algorithms Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } -> PLUG IN EVOLVING GENOTYPE HERE <- Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation Array = KEY = 18 -> PLUG IN EVOLVING GENOTYPE HERE <- Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation -> PLUG IN EVOLVING GENOTYPE HERE <- Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation -> PLUG IN EVOLVING GENOTYPE HERE <- • The [M0+M1]/2 terminal • Embodies human intuition about the problem to facilitate the solution • Still requires crucial algorithmic insight to be derived via evolution • Later we re-examine this terminal, repealing it altogether. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
If Equivalent Java: INDEX:= = NOP if (arr[INDEX] == KEY) ; else INDEX = ITER; Array [INDEX] KEY ITER Representation - Example • An example correct solution to linear search problem: LISP: (If (= Array[INDEX] KEY) NOP INDEX:= ITER))) Let’s plug into the phenotype frame… Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation - Example • An example correct solution to linear search problem: publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { if (arr[INDEX] == KEY) ; else INDEX = ITER; } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation int search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER=0; ITER < iterations; ITER++) { -> PLUG IN GENOTYPE HERE <- } return INDEX; } • search call: • Always halts • No loop functions • Only read access to ITER • Number of iterations is limited • Inherently deals with keys not in the array • With wrapper function • No early termination when key is found • Harder problem:Evolved algorithm will have to learn to retain correct index. Why? Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Evolutionary Setup • Representation • Phenotype • Genotype • GP Parameters • Fitness Function • GP Operators Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Fitness Function • How do we rate a solution? • Present the individual with many random input arrays • Use search method to search for all keys in all arrays • Reward individual for closeness of returned indexes • Training set includes arrays of all sizes in [minN, maxN] • Array of size n contains: • Linear case: random permutation of [1000, 1000+n-1] • Sublinear case: sorted unique numbers from [n, 100n] • Note key range disjoint from index range • Discourage “cheating” minN=2 … maxN=100 Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
error=2 Fitness Function • Define error per single key search as the distance between the correct index of KEY and the index returned by search(arr,KEY) • Elements are unique • No ambiguity in error definition key = 18 arr = correct search(arr,key) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Fitness Function • Define hit as the finding of the precise location of KEY key = 18 arr = Hit ! correct search(arr,key) error=0 Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Fitness Function • The fitness value of an individual is defined as: • This gives a 0.5% bonus reduction for every 1% of correct hits • For example, if an individual scored 300 hits in 1000 search calls, its fitness will be the average error per call, reduced by 15% • This bonus • encourages perfect answers (“almost” is bad…), • increases fitness variation in population Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Generality Test • The best solution of each run was subjected to a stringent generality test, by running it on random arrays of all lengths in the range [2, 5000] ([2, 500] for linear case). • Kinnear (1993) noted that: “For any algorithm... that operates on an infinite domain of data, no amount of testing can ever establish generality. Testing can only increase confidence.” • We included analysis by hand for selected solutions. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
GP Operators and Parameters Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results - Linear • It turned out that evolving a linear-time search algorithm was quite easy with the function and terminal sets we designed. • 46 out of 50 runs (92%) produced perfect solutions, passing the generality testing of arrays up to length 500. • Our representation rendered the problem easy enough for a perfect individual to appear in the randomly generated generation 0 in three of the runs. • Search space was small enough for random search. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Equivalent Java: if (arr[INDEX] == KEY) M1 = (M0+M1)/2; else INDEX = ITER; If INDEX:= = M1:= Array [INDEX] KEY ITER [M0+M1]/2 Results - Linear • An example evolved solution: LISP: (If (= Array[INDEX] KEY) (M1:= [M0+M1]/2) INDEX:= ITER))) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Irrelevant but does not effect output index Results - Linear • An example evolved solution: publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { if (arr[INDEX] == KEY) M1 = (M0+M1)/2; else INDEX = ITER; } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Sublinear Search • We set iterationsto log2n,and proceeded to evolve sublinear search algorithms. publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results - Sublinear • Unsurprisingly, this case proved a harder problem, but it was also solved by the evolution. • 35 out of 50 runs (70%) produced perfect solutions, passing the generality testing of arrays up to length 5,000. • Solutions emerged between generation22 and 3,632 • Solution sizes varied between 42 and 244 nodes • Runtime: between 2 hours and 2 days on CS grid • 7 runs (14%) produced near-perfect solutions, which failed on a single key in the input arrays (99.96% hits on the generality test) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear • An example simplified evolved solution: LISP: Equivalent Java: • Simplified by hand from a tree of 50 nodes down to 14 (PROGN2 (INDEX:= [M0+M1]/2) (If (> KEY Array[INDEX]) (PROGN2 (M0:= [M0+M1]/2) (INDEX:= M1)) (M1:= [M0+M1]/2)))) INDEX = (M0+M1)/2 ; if (KEY > arr[INDEX]){ M0 = (M0+M1)/2 ; INDEX = M1; } else M1 = (M0+M1)/2 ; Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results - Sublinear publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { INDEX = (M0+M1)/2 ; if (KEY > arr[INDEX]){ M0 = (M0+M1)/2 ; INDEX = M1; } else M1 = (M0+M1)/2 ; } return INDEX; } This is a form ofBinary Search(with a small twist) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Less Knowledge – More Automation • Re-examining representation: • Most terminals and functions are either • General-purpose or • Problem-specific • However, one terminal stands out: [M0+M1]/2 • Solution-specific • We proceed to • Remove [M0+M1]/2 terminal • Add an automatically defined function (ADF) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Adding ADF PROGN2 PROGN2 PROGN2 INDEX:= INDEX:= INDEX:= INDEX Array [INDEX] Array [INDEX] KEY KEY ITER M0:= M0:= M1:= M1:= M1 M1 M0 TRUE FALSE NOP [M0+M1]/2 If If [M0+M1]/2 [M0+M1]/2 [M0+M1]/2 ADF0 < = > > Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Adding ADF PROGN2 PROGN2 PROGN2 ADF0 INDEX:= INDEX:= INDEX:= INDEX Array [INDEX] Array [INDEX] KEY KEY ITER M0:= M0:= M1:= M1:= M1 M1 M0 ADF Functions & Terminals TRUE FALSE NOP TRUE If If ADF0 ADF0 ADF0 + + / / 0 1 1 M0 M0 < = > > - - * * 2 M1 M1 Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
GP Parameters – with ADF • Array size was increased to: • minN = 200 • maxN = 300 • To avoid non-general solutions • example to follow • Different function set for main and ADF trees • Crossover is tree-wise • Mutation performed better than crossover • Especially for ADF tree Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
GP Parameters – with ADF • Same as previous setup, with the following changes: Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear with ADF • The sublinear search problem with an ADF naturally proved more difficult than with the [M0+M1]/2 terminal • 12 out of 50 runs (24%) produced perfect solutions, passing the generality testing of arrays up to length 5,000 (later passed all test up to size 20,000) • Solutions emerged between generation54 and 4,557 • Solution sizes varied between 53 and 244 nodes • Runtime: between 4 hours and 2 weeks on CS grid • An additional run produced a non-standard solution – will be discussed later Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear with ADF • Analysis revealed all perfect solutions to be variations of binary search • The algorithmic idea can be deduced by inspecting the ADFs, all of which turned out to be equivalent to one of the following (all fractions truncated): which are reminiscent of the [M0+M1]/2 terminal we dropped (M0+M1)/2 (M0+M1+1)/2 M0/2+(M1+1)/2 Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear with ADF • An example simplified evolved solution: LISP: Equivalent Java: • Simplified by hand from a tree of 58 nodes down to 26 (PROGN2 (PROGN2 (if (< Array[INDEX] KEY) (INDEX:= ADF0) NOP) (if (< Array[INDEX] KEY) (M0:= INDEX) (M1:= INDEX))) (INDEX:= ADF0))) ADF0: (/ (+ (+ 1 M0) M1) 2) if (arr[INDEX] < KEY) INDEX = ((1+M0)+M1)/2; if (arr[INDEX] < KEY) M0 = INDEX; else M1 = INDEX; INDEX = ((1+M0)+M1)/2; (Before simplification: slide 60) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear with ADF publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { if (arr[INDEX] < KEY) INDEX = ((1+M0)+M1)/2; if (arr[INDEX] < KEY) M0 = INDEX; else M1 = INDEX; INDEX = ((1+M0)+M1)/2; } return INDEX; } This is another form ofBinary Search(with a different twist) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Interesting Results • Interesting to mention some of the other evolved solutions • With minN=2, maxN=100 and main-tree max-depth = 17 linearsearch algorithms had evolved, failing on longer arrays • How is this possible (in log2n iterations)? • An O(logn) solution has a constant factor, i.e. algorithm does klogn operations. • We set a limit to number of iterations, where each iteration the full genotype code is executed. • A linear search could evolve, by taking advantage of the constant factor k Skip to next solution Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
If key is found, do nothing else increment INDEX by 1 Interesting Results • Linear solution ADF: ADF0=(M0+1) • Main tree included 16 occurrences of: • For array of size n=100: • logn=7, for k=16: klogn=167>100 (enough to traverse all the array) • We proceeded to • increase minN, maxN (to 200, 300), • decrease maximum k, by lowering max-depth to 10 (If (= Array[INDEX] KEY) NOP (PROGN2 (M0:= ADF0) (INDEX:= M0))) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
256 …. 4 8 2 1 Interesting Results • One more interesting solution has evolved • Returns correct results (100% hits) up to array length ~6,640 • Analyzing it revealed an interesting algorithm which makes a series of jumps in exponentially increasing size • in the form of 2i from 1 to 256 every iteration • Thus was able to handle array sizes n such that (roughly), • n ≤ 512 x log2n n ≤ 6656 Skip Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
M1 2*M1-M0-1 256 …. 4 8 2 1 Interesting Results • ADF0 = 2*M1-M0-1 • Main tree included 7-8 occurrences similar to: • Difference grows by factor of 2 (PROGN2 (if (> Array[INDEX] KEY) (M1:= ADF0) NOP) (INDEX:= ADF0)) M1’ 2*M1 -M0-1 M1’’ 2*M1’-M0-1 ------------------ M1’’-M1’ = 2(M1’-M1) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Related Work • No previous work on evolving list search algorithms • “Closest”: sorting algorithms • Loosely related – in both cases, solutions have to be 100% correct • We found 10-15 works on evolving sorting algorithms • Most works have been able to evolve O(n2) sorting algorithms • One work evolved an O(nlogn) algorithm • albeit with a highly specific setup Skip Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Related Work • Kinnear (1993) evolved an O(n2) bubble sort using koza-style GP. • He tried a number of function sets, all quite specific, including • double-for, swap, who-is-bigger functions • Showed that the difficulty in evolving a solution increases as the functions become less problem-specific • (order x y) vs. (if-lt x y work) and (swap x y) • Noted that adding parsimony increased likelihood of evolving a general solution Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Related Work • Withall et. al. (2009) developed a new GP representation • fixed-length blocks of genes, representing single program statements. • The phenotype is a PERL program. • They showed improvement over previous linear-GP representations, in similarity between child and parent, i.e, propagation of characteristics (building blocks) through multiple generations. • A number of list algorithms were evolved • sum-of-elements, max-element, reverse, sort • using problem-specific functions for each algorithm • Functions included • for loop function • double function – a highly specific double-for nested loop. • With these specialized structures they evolved an O(n2) bubble sort algorithm. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Related Work • An O(nlogn) solution was evolved by Agapitos et. al. (2006-7) • The evolutionary setup was based on their object-oriented genetic programming system. • They compared five different fitness functions based on various measures of array disorder. • To avoid non-terminating programs they defined an upper bound on recursive calls, based on their hand-coded implementation. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009