1 / 42

Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Sequence Optimization For Synthetic Genes Using Genetic Algorithms. David Sigfredo Angulo 1 Rob Vogelbacher 1, Benjamin R. Capraro 2 , Tobin Sosnick 2 , Shohei Koide 2 1 School of Computer Science Telecommunications and Information Systems DePaul University

glenna
Download Presentation

Sequence Optimization For Synthetic Genes Using Genetic Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Optimization For Synthetic GenesUsing Genetic Algorithms David Sigfredo Angulo1 Rob Vogelbacher1, Benjamin R. Capraro2, Tobin Sosnick2, Shohei Koide2 1 School of Computer Science Telecommunications and Information Systems DePaul University 2 Department of Biochemistry and Molecular Biology The University of Chicago

  2. Introduction • Genetic Algorithms: • Using ideas based on the biology of genes • Create software to use such a stochastic means to search through large searchspaces • Resulting algorithm has nothing to do with genes • Designing Genes • This search space is huge • REALLY NOVEL IDEA: • Use Genetic Algorithms based on genes to design genes!!

  3. Outline • Short biology Tutorial • DNA Sequence Generation • Why is the problem difficult? • IBG Gene Designer • Genetic Algorithm (GA) solution • Heuristics and Fitness Evaluation

  4. First • Before the problem can be described • Must give some background biochemistry principles • Tutorial outline • DNA • Codons • Protein • Synthetic genes • What are they and what are they used for? • Restriction Enzymes • Expressing Proteins using Vectors

  5. Transcription/Translation Central Dogma of Molecular Biology Transcription Translation DNA RNA Protein RNA Polymerase Ribosomes

  6. DNA • Deoxyribonucleic acid • Strand backbone is made of sugar & phosphate molecules • Strands connected by nitrogen containing nucleotide bases • Two strands join making a double helix • Each strand is made of nucleotides joined together

  7. 2 nm 11 nm 30 nm 300 nm 700 nm 1100 nm Short region of DNA 2bl helix "beads on a string" form of Chromatin 30 nm chromatin fiber of packed nucleosomes Section of chromosome in an extended form Condensed section of chromosome Entire mitotic chromosome

  8. DNA Four Nucleotides: AGTC

  9. DNA: Base Pairing

  10. Short Biology Tutorial • Tutorial outline • DNA • Codons • Protein • Restriction Enzymes • Expressing Proteins using Vectors

  11. DNA Sequence Generation:Codon to Amino Acid Translation http://campus.queens.edu/faculty/jannr/Genetics/images/codon.jpg

  12. Short Biology Tutorial • Tutorial outline • DNA • Codons • Protein • Restriction Enzymes • Expressing Proteins using Vectors

  13. Proteins: AA Chains

  14. Proteins • Amino Acid Chains Fold Into complex 3D Structures • Functional properties depend on3D structure • Usefulness depends onfunctional properties • E.g. designing drugs

  15. Designed/Expressed Proteins Extremely Useful • Designed Proteins • Can be used to study protein structure • Can be used to study effects of otther proteins • Can be designed to “knock out” other proteins • Can be designed to “block” the acgtion of other proteins • Expressed proteins • Expressed in cow’s milk or chicken eggs • Can manufacture drugs on large scales in this way • E.g. insulin

  16. Synthetic Genes • DNA sequences • “backtranslated” from a novel Protein or Amino Acid sequence Transcription Translation DNA RNA Protein RNA Polymerase Ribosomes • We’ll put the DNA for our designed protein into an organism (a vector) • Then that vector will make (express) our protein • But, how do we get the DNA into an organism???

  17. Short Biology Tutorial • Tutorial outline • DNA • Codons • Protein • Restriction Enzymes • Expressing Proteins using Vectors

  18. Restriction Enzyme Digests • Watson – Crick 1953 • Took 20 years to be able to do anything with DNA • H. Smith (and others) made a discovery that allowed manipulation and deciphering of DNA • Discovery was that bacteria produced enzymes that introduce breaks in double stranded DNA molecules whenever they encountered a specific string of nucleotides • These enzymes are called Restriction Enzymes • Restriction Enzymes can be used as precise scissors • They let biologists cut (and paste) portions of DNA

  19. EcoRI 5'-GAATTC-3' 3'-CTTAAG-5' Regulated by EcoRI 5'-G AATTC-3' 3'-CTTAA G-5' • EcoRI was the very first Restriction Enzyme discovered • "Eco" because it was isolated from E. Coli (Escherichia Coli) • "R" because it is a Restriction Enzyme • "I" because it was the first Restriction Enzyme from E. Coli • Now over 300 Restriction Enzymes known • EcoRI cleaves (restricts, digests) DNA • Between the G and A nucleotides • Only when it encounters them in the string 5'-GAATTC-3' • This is called therestriction site

  20. Sticky Ends 5'-GAATTC-3' 3'-CTTAAG-5' Regulated by EcoRI 5'-G AATTC-3' 3'-CTTAA G-5' • Many restriction enzymes in such a way that some single stranded DNA is left at both ends • These nucleotide sequences • Are complimentary to each other • Are 5'-AATT-3' in the case of EcoRI • Can base pair with other nucleotides in a sequence • Thus, are called "sticky ends" • Can temporarily hold twoDNA strands together • The enzyme ligasewill permanently jointhose strands • This is calledligation

  21. Short Biology Tutorial • Tutorial outline • DNA • Codons • Protein • Restriction Enzymes • Expressing Proteins using Vectors

  22. Gene Synthesis:On the Lab Bench • Initial Sequence Construction • Oligonucleotides (short strands of DNA) are defined with complementary overlapping sites • The “sticky ends” • Assembly PCR • Oligonucleotides and polymerase are mixed and placed in a thermocycler • Creates contiguous DNA sequence from component oligos

  23. Gene Synthesis:On the Lab Bench (cont)‏ • After PCR, generated DNA sequence cut with restriction enzymes • Expression hosts's plasmid cut with restriction enzymes • Synthetic gene inserted into plasmid and plasmid repaired • Expression Vectors • Host organisms used to express the synthetic genes (make the protein) • Typically E. Coli • Possibly Chickens or Cows • Expression vector can now express protein coded for by synthetic gene • A bit more complicated than described above!!!

  24. DNA Sequence Generation:Gene Insertion

  25. Outline • Short biology Tutorial • DNA Sequence Generation • Why is the problem difficult? • IBG Gene Designer • Genetic Algorithm (GA) solution • Heuristics and Fitness Evaluation

  26. DNA Sequence Generation:The Computational Problem • Why is the problem difficult? • Conflicting goals • Avoid restriction sites • Maximizing Codon Preference • Thus, cannot use deterministic algorithm • Degeneracy (redundancy) of the DNA code – 64 codons, 20 (21) amino acids (see next slide) • Several synonymous codons are translated into the same amino acid • Synonymous codons per AA vary from one to six (average is four codons per AA)‏ • Huge number of possible DNA Sequences • Average 2N for protein of amino acid length n • Codon Preference • Varying levels of tRNA assembly components in organisms • Codon usage for a particular AA greatly influence protein expression • (continued)

  27. DNA Sequence Generation:Codon to Amino Acid Translation http://campus.queens.edu/faculty/jannr/Genetics/images/codon.jpg

  28. DNA Sequence Generation:The Computational Problem (cont)‏ • Why is the problem difficult? • (continued) • Restriction Enzymes • The vector will contain many restriction enzymes • If these cut up our DNA, we won’t express our proteins • We must design the DNA string using synonymous codons so that there are no restriction sites • Helpful to include some other restriction sites • We must design the DNA string using synonymous codons so that these are included • (continued)

  29. DNA Sequence Generation:The Computational Problem (cont)‏ • Why is the problem difficult? • (continued) • mRNA Secondary Structure • In prokaryotes, mRNA can fold into complex shapes • This inhibits protein creation • Oligonucleotide generation • Want a specific melting temperature so that the complex folding doesn’t take place • The “sticky ends” must have the same melting temperature so that they will bind together.

  30. Outline • Short biology Tutorial • DNA Sequence Generation • Why is the problem difficult? • IBG Gene Designer • Genetic Algorithm (GA) solution • Heuristics and Fitness Evaluation

  31. IBG GeneDesigner:Our Solution • IBG GeneDesigner

  32. IBG GeneDesigner:Genetic Algorithm • Uses a Genetic Algorithm for sequence optimization • Tournament selection model • Uniform and single-point crossover (behind the scenes – not user selectable at present.)‏ • Mutation causes codon “wobbling” • Sequence “fitness” determined by heuristic evaluation

  33. IBG GeneDesigner:Fitness Evaluation • GeneDesigner heuristics • Manipulation of nucleotide percentages/ratios to reduce mRNA secondary structure formation • Inclusion and Exclusion of restriction sites • Restriction sites requested for inclusion should only occur once • Matching of codon preference • Oligonucleotide generation • Fitness determined by melting points, start and end nucleotide

  34. IBG GeneDesigner:Future Work • Algorithm parameters • Systematically manipulate GA parameters to identify default values for sequence optimization • Population size • Number of generations • Mutation rate • Convergence criteria • Modify heuristic weighting scheme • Selection models • Experiment with alternative selection models (Roulette wheel, elitism, limit population replacement)‏

  35. IBG GeneDesigner:Future Work • Move algorithm to ECJ architecture • Use the Strength-Pareto multi-objective optimization algorithm • Create web-based version of application • Explore island model effects on optimization

  36. Results • IBG GeneDesigner utilized to generate a nucleotide sequence for the SH3 domain of a-spectrin1. • The codon optimization option was set for expression in E. coli with a 40% G/C bias • We also used the application to generate four assembly PCR template oligonucleotide sequences to produce the protein coding sequence flanked by desired restriction enzyme recognition sites. • The calculated Tm values of the three overlapping regions were within 1.6oC • Promoting similar annealing behavior between strands. • Success of the reaction was confirmed by DNA sequencing of a pUC19 expression vector containing the PCR product cloned between restriction sites included in the gene design. • Summary: Protein Made!!!

  37. Input: Protein Sequnce, Vector, Restriction Enzymes

  38. Input: Flanking Sequences

  39. Input: Algorithm Parameters and Fitness Scores

  40. Output: Generation of Oligonucleotides

  41. Acknowledgements • Graduate student who did much of the coding • Rob Vogelbacher • University of Chicago undergraduate who used it to build a protein • Benjamin R. Capraro • His advisor • Tobin Sosnick • Our collaborator at University of chicago • Shohei Koide

More Related