1 / 14

Using a Genetic Algorithm for Approximate String Matching on Genetic Code

Using a Genetic Algorithm for Approximate String Matching on Genetic Code. Carrie Mantsch December 5, 2003. Outline. Problem Statement Current Techniques GA Motivation My Algorithm Results Extension Possibilities. Problem Statement.

lundquist
Download Presentation

Using a Genetic Algorithm for Approximate String Matching on Genetic Code

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

  2. Outline • Problem Statement • Current Techniques • GA Motivation • My Algorithm • Results • Extension Possibilities

  3. Problem Statement The problem is to search and align strands of DNA using a genetic algorithm.

  4. Current Techniques • Approximate string matching • Usually meant for smaller strings • Many are set up for k mismatches • 2 DNA strands of size 90 and 85 • Allowing for 5 gaps in the second strand gives almost 44 million possible alignments

  5. Current Techniques (cont.) • Needleman-Wunsch • Gap penalty -1 • Match bonus +1 • Mismatch 0 • Not practical if the sequence starts in the middle • Counts the gaps at the beginning and end as penalties.

  6. Current Techniques (cont.) • BLAST (Basic Local Alignment Search Tool) and FASTA • Use domain specific knowledge • http://www.ncbi.nlm.nih.gov/BLAST • http://fasta.bioch.virginia.edu

  7. GA Motivation • Alien DNA • Junk DNA • Extendable to similar text searches without domain specific knowledge

  8. My Algorithm • The population • Bit strings of 0’s and 1’s • 0’s are spaces, 1’s mean a letter is placed there • The number of 1’s stays constant as the number of letters in the smaller search string

  9. My Algorithm (cont.) • Breeding • Rank based selection • Crossover • The common place markers are kept the same • The rest of the place markers are split evenly between the two children

  10. My Algorithm (cont.) • Mutation • If the amount of gaps is less than one tenth of the small string size add a gap • Otherwise delete a gap

  11. Results • The target match

  12. Results (cont.) • Ran for 50 generations • Different random numbers for the same number of generations give best fitness values between about 32 and 67 (optimal fitness - 90)

  13. Extension Possibilities • Better representation of population • Be able to alter fitness evaluation to be more specific to different problems • Ability to add domain specific knowledge • Parallel searching

  14. Questions?

More Related