1 / 32

Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Study

Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Study. B. Earl Wells * , Clint Patrick, Luis Trevino, John Weir and Jim Steincamp NASA Marshall Space Flight Center Huntsville, Alabama. * University of Alabama in Huntsville, Huntsville, Alabama.

Download Presentation

Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Study B. Earl Wells*, Clint Patrick, Luis Trevino, John Weir and Jim Steincamp NASA Marshall Space Flight Center Huntsville, Alabama *University of Alabama in Huntsville, Huntsville, Alabama 1

  2. To evaluate the technology of reconfigurable computing -- determine its level of maturity and suitability for use in future NASA applications To implement a nontrivial test bed type application on a Star Bridge Hypercomputer Model 36 Chosen Application: a simple Genetic Algorithm Project Motivation & Objectives 2

  3. Targeted Hardware Platform • Starbridge HC-36 Hypercomputer System • Employs Xilinx Virtex II 6000 Series FPGAs 3

  4. Development Environment • Development Environment: VIVA ™ Graphical User Interface Structural Design Philosophy with Behavioral Attributes: Polymorphism Object Overload Recursion Data flow and data driven type synchronization between objects (Go, Done, Busy, Wait protocol) Large library of high end objects Environment falls somewhere between hardware description languages and schematic capture packages 4

  5. Polymorphism, Overloading, Recursion, and Synchronization Example: Object to Determine Number of 1’s in a Binary Number Terminal Case Recursive Case 5

  6. Biologically Inspired Search Techniques Employs Selection, Replication (crossover), Mutation, and Replacement Iterative method -- very time intensive Regularly Structured Large Amounts of Concurrency Present that can be Exploited Genetic Algorithms 6

  7. Genetic Algorithm Implementation Top Level View Run Time Environment 7

  8. 2 Way Tournament Selection No Elitism Single Point Cross Over with bit-wise mutation Weight Encoded Chromosome (weight translated into rank ordering of cities) Adjustable Parameters Population Size 2 to 512 (powers of 2), Number of Generations, Probability of Mutation, Probability of Crossover GA Characteristics 8

  9. Block Diagram Level View ofGenetic Algorithm Implementation 9

  10. Replacement & Chromosome Storage 10

  11. Selection 11

  12. Chromosome 1 Standard Single Point Crossover Operation (Weighted Chromosomes) Crossover Point = 4 {25,17,10,20,33,14,7,29} Chromosome 2 {44,12,17,38,20,5,70,13} Offspring Chromosome {25,17,10,20, 20,5,70,13} 12

  13. Standard Single Point Crossover Operation (Weighted Chromosomes) 13

  14. Single Point Mutation (Weighted Chromosomes) Original Chromosome {25,17,10,20,20,5,70,13} Mutated Element = 5 Mutated Chromosome {25,17,10,20,55,5,70,13} 14

  15. Given a specified number of “cities” along with the cost of travel between each pair of them, find the cheapest way of visiting all the cities and returning to the first city visited Asymmetric Case – direction traveled between any two cities matters (i.e. cost is different) Possible solutions (n-1)! – where n is the number of cities Traveling Salesman Problem (TSP) 15

  16. Well understood NP Complete optimization problem Academic literature contains many test problems Chose for test purposes an Asymmetric TSP with 65 cities (TSP 65)* Used a modified weight encoded chromosome representation Traveling Salesman Problem (TSP) *University of Heidelberg, http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95 16

  17. Equivalent TSP Chromosome Representations Weighted Chromosome City No. 0 1 2 3 4 5 6 7 {25,17,10,20,55, 5,70,13} weights Rank Ordering [ 5, 3, 1, 4, 6, 0, 7, 2 ] Visit Order Permutation Chromosome City Visit Order 1st 2nd 3rd 4th 5th 6th 7th 8th { 5, 2, 7, 1, 3, 0, 4, 6} city numbers 17

  18. TSP Objective Function • Systolic sort of chromosome weights • Summation of segments • Replacement of weights with rank orderings 18

  19. Chromosome 1 Single Point Permutation Preserving Crossover Operation Crossover Point = 4 {1,7,3,2,5,6,0,4} Chromosome 2 {0,2,4,1,6,5,7,3} Offspring Chromosome {1,7,3,2,0,4,6,5} 19

  20. Modified Crossover Operator 20

  21. Permutation Altering Mutation Original Chromosome {1,7,3,2,0,4,6,5} Mutation Removal Point = 6 Insertion Point = 3 Mutated Chromosome {1,7,4,3,2,0,6,5} Note: No change in Mutation Operator Needed 21

  22. 22

  23. Comparison with Instruction Set Processor, ISP, Implementations • Implemented TSP using a high-end 3.2 GHz Intel Xeon Processor with 3-level Cache • Encoded Problem in C using pointers for maximum efficiency • OS: Redhat Enterprise Linnx v 3 (Kernal 2.4.21 SMP) -- single user • Basic Methodology Required ~1.6 mS/per Generation (population size 512) • Optimized Version Required ~ 0.8ms/per Generation (population size 512) 23

  24. Initial Basic Reconfigurable Implementation on the Starbridge System required ~1.1 mS/per Generation! [slower than the optimized ISP implementation] (population size = 512, Clock speed 66 MHz) MORE PARALLELIZATION WAS NEEDED! Parallelization Strategies 24

  25. Exploiting Concurrency in a Common Population Temporal Parallelism via pipelining Spatial Parallelism via replicating functional units Processing Isolated Subpopulations With chromosome migration (very promising for Starbridge system but not yet completed) Parallelization Strategies 25

  26. Applying Temporal Parallelism 26

  27. Applying Spatial Parallelism 27

  28. 28

  29. Non-pipelined 1 TSP Implementation Number of SLICES 10910 out of 33792 32% Number of Block RAMs 40 out of 144 27% Total equivalent gate count: 2,767,231 Pipelined 1 TSP Implementation Number of SLICES 10957 out of 33792 32% Number of Block RAMs 40 out of 144 27% Total equivalent gate count: 2,770,741 Resource Requirements 29

  30. Pipelined 2 TSP Implementation Number of SLICES 13738 out of 33792 40% Number of Block RAMs 45 out of 144 31% Total equivalent gate count: 3,149,966 Pipelined 4 TSP Implementation Number of SLICES 19685 out of 33792 58% Number of Block RAMs 55 out of 144 38% Total equivalent gate count: 3,908,362 Pipelined 6 TSP Implementation Number of SLICES 25728 out of 33792 76% Number of Block RAMs 65 out of 144 45% Total equivalent gate count: 4,664,262 Resource Requirements 30

  31. Synthesis Time Issues (within Viva and within Xilinx) Maturity/Robustness of CAD Tools Learning Curve Timing Issues I/O Pin Limitations Problems Encountered 31

  32. A simple genetic algorithm was implemented on reconfigurable hardware using the Viva paradigm Significant but not spectacular speedups have been obtained for the TSP using a combination of temporal and spatial parallel processing methods Many other opportunities exist to improve processing through put The concept of isolated subpopulations is very promising method to further improve performance Summary & Conclusion 32

More Related