280 likes | 380 Views
Class Project 510. Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou. Project Overview. Problem Description Requirements Analysis Technology Settings and System Design Algorithm Graphical User Interface (GUI) Lessons Learned
E N D
Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou
Project Overview Problem Description Requirements Analysis Technology Settings and System Design Algorithm Graphical User Interface (GUI) Lessons Learned Future Enhancement
Problem Description • In this project, we are attempting to design a Genetic Programming system that will produce a pre-defined mathematical equation equivalent to (y = (x² + 1) / 2), derived from training data consisting of several values for x and the resulting values for y. • Analogous to DNA evolution, this program will display characteristics, such as crossover and mutation. • Key components of the system are a fitness and selection function that will decide if the generated solution meets minimum requirements. • We expect that each subsequent generation of solutions will be “better” – that is, will better reproduce the training data – than the previous generation, thus eventually resulting in a correct mathematical equation.
Requirement Analysis • Given training data, consisting of a set of ten positive x values and the matching y values, the genetic programming system will generate a function that closely matches the pre-defined mathematical function, y = (x² +1)/2. • The resulting function must be generated within the allotted fifteen minutes. • The expected output of the system will consist of • Mathematical function: y = (x² +1)/2 • Total elapsed time • Any pertinent information related to the resulting function, such as the number of generations evolved, function, fitness value, etc.
Requirement Analysis - Continued • If the genetic programming system fails to produce a function within an acceptable tolerance level in the fifteen minute time frame, then terminate execution • Output the best function along with its associated fitness value upon termination of the Genetic Programming generation and testing loop, whether due to: • finding a solution within the desired tolerance OR • the allocated time expiring • The system must be able to accept a change in requirements a week before the due date • The genetic programming system must run on PCs available in the classroom.
Requirement Analysis – cont. Finite State Machine
Requirement Analysis – cont. Unified Modeling Language
Requirement Analysis – cont. Data Flow Diagram
Technology Programming Language • Sun Java 1.4 Development Environments • NetBeans • Eclipse • EditPlus • DOS Prompt
Why Java? • There were a number of programming languages for our use in this project, such as C or C++. • Java was chosen as the programming language of choice for a number of reasons: • When we were evaluating the technical skills of each team member, Java was the language with the greatest familiarity in the group • Java is free to download and use • The construction of the GP Programs from individual nodes lends itself to an object-oriented methodology, and Java is an object-oriented programming language. • Ease of implementation was another consideration since we are not familiar with the classroom where the presentation will take place.
Settings & System Design • Using an object-oriented system design that reflects the UML shown in the Requirements Analysis section, each class will be implemented by a separate java .class file. • All .class files needed by the genetic programming system will be stored in the same directory on the PC on which the program is run. • For the inital version of the program, • All inputs will be hard coded within the Java source code • The output will be written to the standard output when executed from a command prompt.
Settings & System Design – cont. • Random Number Generator • Java class using system time as a seed • Function and Terminal Set • Numbers 1 through 9 • Operators: +, -, *, / • Data Structures Used • Binary Tree • Creation of generated functions • Maximum Depth = 5 • Stack • Evaluation using postfix traversal • Determining crossover point
Settings & System Design – cont. • Programs per Generation • 50 programs per generation • Genetic Operator Probabilities • Crossover = 80% • Mutation = 10% • Reproduction (Cloning) = 15% • New Entrant = 5%
Settings & System Design – cont. • Divide by Zero • Dead on Arrival (DOA) indicator • If TRUE, the function will not be included for consideration into the next generation
Algorithms • Fitness and Selection • Fitness: sum of squared errors; targeted fitness value = zero. • p(i) = (1 / (n-1)) * [1 - (Fit(i) / Sum Fit(i))] for n > 1; 100% otherwise • Any GP programs with division by zero errors for any x value in the training data are determined to be "Dead On Arrival", and are not allowed to reproduce or count toward the total and average fitness values for the generation. • Method of Tree Traversal • We implanted a post-order method for tree traversal.
Algorithms - continued • Sorting • After a new generation of GP programs has been created and each one evaluated, they could be sorted in ascending order of fitness. • This would ease the selection of valid functions into the subsequent generation because the possible solution would be towards the front of the array. We chose not to use any sorting in any part of the GP Project for a number of reasons. • One reason is that we were concerned about the fifteen minute time limit. • Also, we chose to simplify the design to meet the deadline of the project. We are also attempting to implement a GUI and we were concerned that this logic would consume much needed processing time from the CPU. • We have considered adding sorting by fitness value as a future enhancement.
Algorithms – continued. Key Correction to Algorithm: • Issue: When reviewing the graph of best fit and average fit of each succeeding generation, the values were swinging up and down, rather than being continuously non-increasing (that is, never increasing; always decreasing or remaining level). • Resolution: Thus, rather than just cloning randomly selected individuals from the prior generation, make sure that the best program from the prior generation survives unchanged as the first program added to the new generation. This guarantees that the best fit for a program in the new generation can be no worse than the best fit from its previous (parent) generation
Best Fit of GP Program by Generation - continued Before Fix:
Best Fit GP Program by Generation After Fix:
Lessons Learned and Future Enhancements byGeoffrey A. Reglos
Lessons Learned • I got good practice at reading and working with other people’s code and writing code that conformed to project specifications. • I personally have learned an essential step in the development of a computer program especially when John and others start with a simple solution, then seek to understand that solution’s performance characteristics, which I feel that it helps me to see how to develop the computational procedure for solving a problem
Lessons Learned -continue • I underestimated the work involved with documentation. Thus, I learned about the need for the documenter to work more closely with the developer to understand the details of the program(s). • I learned to work with a group of people in a short term project. We were able work within each individual’s strengths and weaknesses to accomplish a goal of successfully completing the project in a timely manner. The important characteristics of working with this group were communication and trust of some degree.
Lessons Learned -continue • I learned more about the use of probability of survival, so common to actuarial work, applied to the creation of new software by software.
Future Enhancements • Implement sorting in ascending order for the functions in a generation. This will ensure that the function with the best fitness value is at the top. • Implement more flexibility of the input of training data. Currently, the training data is hardcoded. We would like to have a GUI which will offer the user a number of choices in how to accept training data in different formats. This would also involve adding more logic to parse and format the data into an acceptable form for use by the GP program. • Use Ant to simplify the task of managing the build of the project.