1 / 47

Overview of Genome Sequencing Progress

Overview of Genome Sequencing Progress. Eric C. Rouchka, D.Sc. Bioinformatics Journal Club October 1, 2003. DNA Sequences. DNA: double stranded helix Composed of 4 bases: A,C,G,T Genome: linear chain of bases Humans: 22 Autosome pairs, 2 sex chromosomes, 3.2 billion bases.

aislin
Download Presentation

Overview of Genome Sequencing Progress

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Genome Sequencing Progress Eric C. Rouchka, D.Sc. Bioinformatics Journal Club October 1, 2003 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  2. DNA Sequences • DNA: double stranded helix • Composed of 4 bases: A,C,G,T • Genome: linear chain of bases • Humans: 22 Autosome pairs, 2 sex chromosomes, 3.2 billion bases (Image source: http://www.ebi.ac.uk/) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  3. Double Helix • Two complementary DNA strands form a stable DNA double helix • A, T are complements; G, C are complements Image source; www.ebi.ac.uk/microarray/ biology_intro.htm 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  4. Brief History of Sequencing • Discovery of Complementary Bases • Erwin Chargaff, 1950 • Discovery of DNA Double Helix • 1953 – only 50 years ago • James Watson • Francis Crick • Rosland Franklin Image: www.simr.org.uk/pages/biotechnology/ biotechnology_2.html 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  5. Central Dogma DNA  RNA  PROTEIN 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  6. RNA • Ribonucleic Acid • Similar to DNA • Thymine (T) is replaced by uracil (U) • RNA can be: • Single stranded • Double stranded • Hybridized with DNA 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  7. RNA • RNA is generally single stranded • Forms secondary or tertiary structures • Important in a variety of ways, including protein synthesis 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  8. How Central Dogma Works • DNA “transcribed” into SS mRNA • mRNA “translated” into protein using tRNA • Triplet bases (codons) used to code amino acids • 3 mRNA bases code one amino acid 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  9. History Of Genetic Code • Genetic Code Completely uncovered (1965) • Marshall Nierenberg 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  10. Genetic Code • 4 possible bases (A, C, G, U) • 4 * 4 * 4 = 64 possible codon sequences • Start codon: AUG • Stop codons: UAA, UAG, UGA • 61 codons to code for amino acids (AUG as well) • 20 amino acids – redundancy in genetic code 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  11. Glycine (G, GLY) Alanine (A, ALA) Valine (V, VAL) Leucine (L, LEU) Isoleucine (I, ILE) Phenylalanine (F, PHE) Proline (P, PRO) Serine (S, SER) Threonine (T, THR) Cysteine (C, CYS) Methionine (M, MET) Tryptophan (W, TRP) Tyrosine (T, TYR) Asparagine (N, ASN) Glutamine (Q, GLN) Aspartic acid (D, ASP) Glutamic Acid (E, GLU) Lysine (K, LYS) Arginine (R, ARG) Histidine (H, HIS) START: AUG STOP: UAA, UAG, UGA 20 Amino Acids 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  12. tRNA structure http://www.tulane.edu/~biochem/nolan/lectures/rna/frames/trnabtx2.htm 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  13. Protein Structure Image source; www.ebi.ac.uk/microarray/ biology_intro.htm 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  14. Brief History of Sequencing • First Protein Sequence • ~1955 Bovine Insulin (Fred Sanger) • First DNA Sequence • ~1965 yeast alanine tRNA (77 bases) • Development of DNA sequencing • Maxam-Gilbert and Sanger Methods (1977) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  15. Sanger Sequencing Method • (Quicktime Movie) • SOURCE: Molecular Cell Biology 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  16. Improving Sanger’s Method • Dideoxynucleosides fluorescently labeled (1986) • Reaction cut by ¼ • Sequencing Automated by machine (1986) • Laser detects fluorescence 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  17. Image Source: plantbio.berkeley.edu/ ~bruns/tour3.html 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  18. 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  19. Genetic Mapping • Sex-linked genes studied since early 1900s • Gene mapping takes off in late 1970s • David Botstein (RFLPs 1978) • 1979: 579 Genes Mapped • 2003 ~30,000 Genes Mapped • Mapping of Huntington’s Disease (First Diseased Gene) • Triplet Repeat • 1983 • Nancy Wexler 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  20. Mapping of Markers • Sequence Tagged Sites (STS) • Sequences occurring only once in the human genome • Help to map locations • 52,000 STS in Humans • ~ 1 every 62,000 bases 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  21. Cloning Techniques • Plasmid Cloning Introduced (1973) • Region of Interest duplicated by inclusion • YAC Chromosomes described (1987) • BACs introduced (1992) • 30,000 to 100,000 bases can be cloned 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  22. Hierarchical (Clone-based) Approach • Know location of 30,000 – 100,000 bp region • Break into 500-700 bp fragments • Sequence Fragments • Assemble based on similarity • ~8-10x coverage • Current Price: $0.09 / base 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  23. Hierarchical (clone-based) approach • generate overlapping set of clones • select a minimum tiling path • shotgun sequence each clone 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  24. Hierarchical (clone-based) approach • MINUS • map generation requires resources, time and money • Some regions not cloned • PLUS • easier to assemble smaller pieces • less chance for assembly error 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  25. Shotgun Sequencing Approach • Developed 1991 TIGR • Craig Venter, Hamilton Smith • Break genome into millions of pieces • Sequence each piece • Reassemble into full genomes 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  26. Whole Genome Shotgun Approach • reads generated directly from a whole-genome library • assemble the genome all at once 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  27. Whole Genome Shotgun Approach • MINUS • more prone to assembly error • computationally intensive • cannot effectively handle repeats • PLUS • Less overhead time up front 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  28. Base calling and Assembly Software • PHRED and PHRAP Developed (1988) • PHRED: Base calling software • PHRAP: Assists in assembly of sequenced data 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  29. Available Assemblers • SEQAID (Peltola et al., 1984) • CAP (Huang, 1992) • PHRAP (Green, 1994) • TIGR Assembler (Sutton et al., 1995) • AMASS (Kim et al., 1999) • CAP3 (Huang and Madan, 1999) • Celera Assembler (Myers et al., 2000) • EULER (Pevzner et al., 2001) • ARACHNE (Batzoglou et al., 2002) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  30. History of Genome Projects • First Genome Sequence • FX174 Phage 5,386 bp; 9 proteins (1980) • Haemophilus Influenzea Sequenced • First non-viral genome (1.8 MB) (1995) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  31. History of Genome Projects • Saccharomyces cereviseae sequenced • First eukaryotic genome (12.1 MB) (1996) • Caenorhabditis elegans sequence released • First animal genome 200 MB (1998) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  32. History of Genome Projects • Arabidopsis thaliana sequence released • First publicly available plant genome (1999) • Rough Draft of Human Genome Reported (2001) • “Finished” 2003 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  33. Human Genome Project • Began in 1990 (US DOE – 15 years) • Identify all genes in human DNA • Determine sequence of human genome • Develop faster sequencing technologies • Develop tools for data analysis • ELSI 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  34. Microbial Genomes • 122 Complete Genomes in CMR • http://www.tigr.org/tigr-scripts/CMR2/CMR_Content.spl 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  35. Genomes • Fruit Fly • Mouse • Rat • Rice • Zebra fish • Puffer fish • Chicken • Dog • Frog 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  36. Growth of GenBank • 1982: 600,000 Bases • 2002: 28.5 Billion Bases 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  37. Other Notables • Dayhoff ATLAS Database of Proteins (1960s) • Sequence Comparison Algorithms • 1970, Needleman-Wunch (global alignment) • Protein Databank • Brookhaven PDB (1973) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  38. Other Notables • NMR for protein structure identification (1980) • IntelliGenetics Founded • DNA and Protein sequence analysis (1980) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  39. Other Notables • Smith-Waterman algorithm • Local sequence alignment (1981) • GenBank Database created (1982) • Genetics Computer Group Founded • GCG suite (1982) • PCR First Described (1985) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  40. Other Notables • FASTP Algorithm • Protein database searching (1985) • SWISS-PROT • Protein Database (1986) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  41. Other Notables • PERL Programming Language • Allows for sequence manipulation (1987) • NCBI Established (1988) • Human Genome Initiative (1988) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  42. Other Notables • FASTA Program released (1988) • DNA and Protein sequence database searches • BLAST Program released (1990) • Allows for quick database searches • Informax Founded (1990) • Human Genome Project Begins (1990) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  43. Other Notables • Creation and Use of ESTs Described (1991) • Incyte Pharmaceuticals Founded (1991) • TIGR Established (1992) • Shotgun sequencing methods 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  44. Other Notables • Affymetrix founded (1993) • PRINTS protein motif database (1994) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  45. Other Notables First Commercial Microarray chips produced (1996) • Dolly Cloned (1997) • Capillary Sequencing machines introduced (1997) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  46. Other Notables • Celera Genomics Formed (1998) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

  47. More Detailed Histories http://www.netsci.org/Science/Bioinform/feature06.html http://www.dhgp.de/intro/history/history.html 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville

More Related