580 likes | 675 Views
Evolving Efficient List Search Algorithms. Kfir Wolfson Moshe Sipper. Agenda. Introduction Evolutionary Setup Results Less Knowledge – More Automation Related Work Conclusions and Future Work. Introduction. Algorithm design is important task in CS
E N D
Evolving EfficientList Search Algorithms Kfir Wolfson Moshe Sipper Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Introduction • Algorithm design is important task in CS • Evolutionary algorithms have been applied to many areas, but limited research on software engineering and algorithmic design • We introduce the notion “Algorithmic design through Darwinian evolution” • Begin with a benchmark case – List Search Algorithms: • Can evolution be applied to finding a search algorithm? • Can evolution be applied to finding an efficient search algorithm? • We employ Genetic Programming (GP) to the task and show the answer to both questions is affirmative Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Evolutionary Setup • Representation • Phenotype • Genotype • GP Parameters • Fitness Function • GP Operators Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation • Phenotype • Array search algorithm • Searches for a key in a 1-dimentional array Java static function: publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
global variables • Set to: • n for linear search • log2 n for sublinear Representation publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Array index returned (might be “illegal”) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
If INDEX:= = NOP Array [INDEX] KEY ITER Representation • Genotype • Koza-style genetic programming • Evaluation trees • Strongly typed • More understandable algorithms • Function and Terminal sets • Same for evolution of both linear andsublinear search algorithms Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } -> PLUG IN EVOLVING GENOTYPE HERE <- Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation Array = KEY = 18 -> PLUG IN EVOLVING GENOTYPE HERE <- Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation -> PLUG IN EVOLVING GENOTYPE HERE <- Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation -> PLUG IN EVOLVING GENOTYPE HERE <- • The [M0+M1]/2 terminal • Embodies human intuition about the problem to facilitate the solution • Still requires crucial algorithmic insight to be derived via evolution • Later we re-examine this terminal, repealing it altogether. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
If Equivalent Java: INDEX:= = NOP if (arr[INDEX] == KEY) ; else INDEX = ITER; Array [INDEX] KEY ITER Representation - Example • An example correct solution to linear search problem: LISP: (If (= Array[INDEX] KEY) NOP INDEX:= ITER))) Let’s plug into the phenotype frame… Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation - Example • An example correct solution to linear search problem: publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { if (arr[INDEX] == KEY) ; else INDEX = ITER; } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Representation int search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER=0; ITER < iterations; ITER++) { -> PLUG IN GENOTYPE HERE <- } return INDEX; } • search call: • Always halts • No loop functions • Only read access to ITER • Number of iterations is limited • Inherently deals with keys not in the array • With wrapper function • No early termination when key is found • Harder problem:Evolved algorithm will have to learn to retain correct index. Why? Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Evolutionary Setup • Representation • Phenotype • Genotype • GP Parameters • Fitness Function • GP Operators Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Fitness Function • How do we rate a solution? • Present the individual with many random input arrays • Use search method to search for all keys in all arrays • Reward individual for closeness of returned indexes • Training set includes arrays of all sizes in [minN, maxN] • Array of size n contains: • Linear case: random permutation of [1000, 1000+n-1] • Sublinear case: sorted unique numbers from [n, 100n] • Note key range disjoint from index range • Discourage “cheating” minN=2 … maxN=100 Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
error=2 Fitness Function • Define error per single key search as the distance between the correct index of KEY and the index returned by search(arr,KEY) • Elements are unique • No ambiguity in error definition key = 18 arr = correct search(arr,key) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Fitness Function • Define hit as the finding of the precise location of KEY key = 18 arr = Hit ! correct search(arr,key) error=0 Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Fitness Function • The fitness value of an individual is defined as: • This gives a 0.5% bonus reduction for every 1% of correct hits • For example, if an individual scored 300 hits in 1000 search calls, its fitness will be the average error per call, reduced by 15% • This bonus • encourages perfect answers (“almost” is bad…), • increases fitness variation in population Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Generality Test • The best solution of each run was subjected to a stringent generality test, by running it on random arrays of all lengths in the range [2, 5000] ([2, 500] for linear case). • Kinnear (1993) noted that: “For any algorithm... that operates on an infinite domain of data, no amount of testing can ever establish generality. Testing can only increase confidence.” • We included analysis by hand for selected solutions. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
GP Operators and Parameters Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results - Linear • It turned out that evolving a linear-time search algorithm was quite easy with the function and terminal sets we designed. • 46 out of 50 runs (92%) produced perfect solutions, passing the generality testing of arrays up to length 500. • Our representation rendered the problem easy enough for a perfect individual to appear in the randomly generated generation 0 in three of the runs. • Search space was small enough for random search. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Equivalent Java: if (arr[INDEX] == KEY) M1 = (M0+M1)/2; else INDEX = ITER; If INDEX:= = M1:= Array [INDEX] KEY ITER [M0+M1]/2 Results - Linear • An example evolved solution: LISP: (If (= Array[INDEX] KEY) (M1:= [M0+M1]/2) INDEX:= ITER))) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Irrelevant but does not effect output index Results - Linear • An example evolved solution: publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { if (arr[INDEX] == KEY) M1 = (M0+M1)/2; else INDEX = ITER; } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Sublinear Search • We set iterationsto log2n,and proceeded to evolve sublinear search algorithms. publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results - Sublinear • Unsurprisingly, this case proved a harder problem, but it was also solved by the evolution. • 35 out of 50 runs (70%) produced perfect solutions, passing the generality testing of arrays up to length 5,000. • Solutions emerged between generation22 and 3,632 • Solution sizes varied between 42 and 244 nodes • Runtime: between 2 hours and 2 days on CS grid • 7 runs (14%) produced near-perfect solutions, which failed on a single key in the input arrays (99.96% hits on the generality test) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear • An example simplified evolved solution: LISP: Equivalent Java: • Simplified by hand from a tree of 50 nodes down to 14 (PROGN2 (INDEX:= [M0+M1]/2) (If (> KEY Array[INDEX]) (PROGN2 (M0:= [M0+M1]/2) (INDEX:= M1)) (M1:= [M0+M1]/2)))) INDEX = (M0+M1)/2 ; if (KEY > arr[INDEX]){ M0 = (M0+M1)/2 ; INDEX = M1; } else M1 = (M0+M1)/2 ; Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results - Sublinear publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { INDEX = (M0+M1)/2 ; if (KEY > arr[INDEX]){ M0 = (M0+M1)/2 ; INDEX = M1; } else M1 = (M0+M1)/2 ; } return INDEX; } This is a form ofBinary Search(with a small twist) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Less Knowledge – More Automation • Re-examining representation: • Most terminals and functions are either • General-purpose or • Problem-specific • However, one terminal stands out: [M0+M1]/2 • Solution-specific • We proceed to • Remove [M0+M1]/2 terminal • Add an automatically defined function (ADF) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Adding ADF PROGN2 PROGN2 PROGN2 INDEX:= INDEX:= INDEX:= INDEX Array [INDEX] Array [INDEX] KEY KEY ITER M0:= M0:= M1:= M1:= M1 M1 M0 TRUE FALSE NOP [M0+M1]/2 If If [M0+M1]/2 [M0+M1]/2 [M0+M1]/2 ADF0 < = > > Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Adding ADF PROGN2 PROGN2 PROGN2 ADF0 INDEX:= INDEX:= INDEX:= INDEX Array [INDEX] Array [INDEX] KEY KEY ITER M0:= M0:= M1:= M1:= M1 M1 M0 ADF Functions & Terminals TRUE FALSE NOP TRUE If If ADF0 ADF0 ADF0 + + / / 0 1 1 M0 M0 < = > > - - * * 2 M1 M1 Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
GP Parameters – with ADF • Array size was increased to: • minN = 200 • maxN = 300 • To avoid non-general solutions • example to follow • Different function set for main and ADF trees • Crossover is tree-wise • Mutation performed better than crossover • Especially for ADF tree Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
GP Parameters – with ADF • Same as previous setup, with the following changes: Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear with ADF • The sublinear search problem with an ADF naturally proved more difficult than with the [M0+M1]/2 terminal • 12 out of 50 runs (24%) produced perfect solutions, passing the generality testing of arrays up to length 5,000 (later passed all test up to size 20,000) • Solutions emerged between generation54 and 4,557 • Solution sizes varied between 53 and 244 nodes • Runtime: between 4 hours and 2 weeks on CS grid • An additional run produced a non-standard solution – will be discussed later Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear with ADF • Analysis revealed all perfect solutions to be variations of binary search • The algorithmic idea can be deduced by inspecting the ADFs, all of which turned out to be equivalent to one of the following (all fractions truncated): which are reminiscent of the [M0+M1]/2 terminal we dropped (M0+M1)/2 (M0+M1+1)/2 M0/2+(M1+1)/2 Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear with ADF • An example simplified evolved solution: LISP: Equivalent Java: • Simplified by hand from a tree of 58 nodes down to 26 (PROGN2 (PROGN2 (if (< Array[INDEX] KEY) (INDEX:= ADF0) NOP) (if (< Array[INDEX] KEY) (M0:= INDEX) (M1:= INDEX))) (INDEX:= ADF0))) ADF0: (/ (+ (+ 1 M0) M1) 2) if (arr[INDEX] < KEY) INDEX = ((1+M0)+M1)/2; if (arr[INDEX] < KEY) M0 = INDEX; else M1 = INDEX; INDEX = ((1+M0)+M1)/2; (Before simplification: slide 60) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Results – Sublinear with ADF publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { if (arr[INDEX] < KEY) INDEX = ((1+M0)+M1)/2; if (arr[INDEX] < KEY) M0 = INDEX; else M1 = INDEX; INDEX = ((1+M0)+M1)/2; } return INDEX; } This is another form ofBinary Search(with a different twist) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Interesting Results • Interesting to mention some of the other evolved solutions • With minN=2, maxN=100 and main-tree max-depth = 17 linearsearch algorithms had evolved, failing on longer arrays • How is this possible (in log2n iterations)? • An O(logn) solution has a constant factor, i.e. algorithm does klogn operations. • We set a limit to number of iterations, where each iteration the full genotype code is executed. • A linear search could evolve, by taking advantage of the constant factor k Skip to next solution Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
If key is found, do nothing else increment INDEX by 1 Interesting Results • Linear solution ADF: ADF0=(M0+1) • Main tree included 16 occurrences of: • For array of size n=100: • logn=7, for k=16: klogn=167>100 (enough to traverse all the array) • We proceeded to • increase minN, maxN (to 200, 300), • decrease maximum k, by lowering max-depth to 10 (If (= Array[INDEX] KEY) NOP (PROGN2 (M0:= ADF0) (INDEX:= M0))) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
256 …. 4 8 2 1 Interesting Results • One more interesting solution has evolved • Returns correct results (100% hits) up to array length ~6,640 • Analyzing it revealed an interesting algorithm which makes a series of jumps in exponentially increasing size • in the form of 2i from 1 to 256 every iteration • Thus was able to handle array sizes n such that (roughly), • n ≤ 512 x log2n n ≤ 6656 Skip Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
M1 2*M1-M0-1 256 …. 4 8 2 1 Interesting Results • ADF0 = 2*M1-M0-1 • Main tree included 7-8 occurrences similar to: • Difference grows by factor of 2 (PROGN2 (if (> Array[INDEX] KEY) (M1:= ADF0) NOP) (INDEX:= ADF0)) M1’ 2*M1 -M0-1 M1’’ 2*M1’-M0-1 ------------------ M1’’-M1’ = 2(M1’-M1) Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Related Work • No previous work on evolving list search algorithms • “Closest”: sorting algorithms • Loosely related – in both cases, solutions have to be 100% correct • We found 10-15 works on evolving sorting algorithms • Most works have been able to evolve O(n2) sorting algorithms • One work evolved an O(nlogn) algorithm • albeit with a highly specific setup Skip Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Related Work • Kinnear (1993) evolved an O(n2) bubble sort using koza-style GP. • He tried a number of function sets, all quite specific, including • double-for, swap, who-is-bigger functions • Showed that the difficulty in evolving a solution increases as the functions become less problem-specific • (order x y) vs. (if-lt x y work) and (swap x y) • Noted that adding parsimony increased likelihood of evolving a general solution Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Related Work • Withall et. al. (2009) developed a new GP representation • fixed-length blocks of genes, representing single program statements. • The phenotype is a PERL program. • They showed improvement over previous linear-GP representations, in similarity between child and parent, i.e, propagation of characteristics (building blocks) through multiple generations. • A number of list algorithms were evolved • sum-of-elements, max-element, reverse, sort • using problem-specific functions for each algorithm • Functions included • for loop function • double function – a highly specific double-for nested loop. • With these specialized structures they evolved an O(n2) bubble sort algorithm. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009
Related Work • An O(nlogn) solution was evolved by Agapitos et. al. (2006-7) • The evolutionary setup was based on their object-oriented genetic programming system. • They compared five different fitness functions based on various measures of array disorder. • To avoid non-terminating programs they defined an upper bound on recursive calls, based on their hand-coded implementation. Evolutionary Computation and Aritficial Life (ECAL) cousre - CS BGU - July 8th, 2009