480 likes | 653 Views
Overview of Genome Sequencing Progress. Eric C. Rouchka, D.Sc. Bioinformatics Journal Club October 1, 2003. DNA Sequences. DNA: double stranded helix Composed of 4 bases: A,C,G,T Genome: linear chain of bases Humans: 22 Autosome pairs, 2 sex chromosomes, 3.2 billion bases.
E N D
Overview of Genome Sequencing Progress Eric C. Rouchka, D.Sc. Bioinformatics Journal Club October 1, 2003 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
DNA Sequences • DNA: double stranded helix • Composed of 4 bases: A,C,G,T • Genome: linear chain of bases • Humans: 22 Autosome pairs, 2 sex chromosomes, 3.2 billion bases (Image source: http://www.ebi.ac.uk/) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Double Helix • Two complementary DNA strands form a stable DNA double helix • A, T are complements; G, C are complements Image source; www.ebi.ac.uk/microarray/ biology_intro.htm 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Brief History of Sequencing • Discovery of Complementary Bases • Erwin Chargaff, 1950 • Discovery of DNA Double Helix • 1953 – only 50 years ago • James Watson • Francis Crick • Rosland Franklin Image: www.simr.org.uk/pages/biotechnology/ biotechnology_2.html 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Central Dogma DNA RNA PROTEIN 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
RNA • Ribonucleic Acid • Similar to DNA • Thymine (T) is replaced by uracil (U) • RNA can be: • Single stranded • Double stranded • Hybridized with DNA 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
RNA • RNA is generally single stranded • Forms secondary or tertiary structures • Important in a variety of ways, including protein synthesis 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
How Central Dogma Works • DNA “transcribed” into SS mRNA • mRNA “translated” into protein using tRNA • Triplet bases (codons) used to code amino acids • 3 mRNA bases code one amino acid 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
History Of Genetic Code • Genetic Code Completely uncovered (1965) • Marshall Nierenberg 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Genetic Code • 4 possible bases (A, C, G, U) • 4 * 4 * 4 = 64 possible codon sequences • Start codon: AUG • Stop codons: UAA, UAG, UGA • 61 codons to code for amino acids (AUG as well) • 20 amino acids – redundancy in genetic code 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Glycine (G, GLY) Alanine (A, ALA) Valine (V, VAL) Leucine (L, LEU) Isoleucine (I, ILE) Phenylalanine (F, PHE) Proline (P, PRO) Serine (S, SER) Threonine (T, THR) Cysteine (C, CYS) Methionine (M, MET) Tryptophan (W, TRP) Tyrosine (T, TYR) Asparagine (N, ASN) Glutamine (Q, GLN) Aspartic acid (D, ASP) Glutamic Acid (E, GLU) Lysine (K, LYS) Arginine (R, ARG) Histidine (H, HIS) START: AUG STOP: UAA, UAG, UGA 20 Amino Acids 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
tRNA structure http://www.tulane.edu/~biochem/nolan/lectures/rna/frames/trnabtx2.htm 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Protein Structure Image source; www.ebi.ac.uk/microarray/ biology_intro.htm 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Brief History of Sequencing • First Protein Sequence • ~1955 Bovine Insulin (Fred Sanger) • First DNA Sequence • ~1965 yeast alanine tRNA (77 bases) • Development of DNA sequencing • Maxam-Gilbert and Sanger Methods (1977) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Sanger Sequencing Method • (Quicktime Movie) • SOURCE: Molecular Cell Biology 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Improving Sanger’s Method • Dideoxynucleosides fluorescently labeled (1986) • Reaction cut by ¼ • Sequencing Automated by machine (1986) • Laser detects fluorescence 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Image Source: plantbio.berkeley.edu/ ~bruns/tour3.html 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Genetic Mapping • Sex-linked genes studied since early 1900s • Gene mapping takes off in late 1970s • David Botstein (RFLPs 1978) • 1979: 579 Genes Mapped • 2003 ~30,000 Genes Mapped • Mapping of Huntington’s Disease (First Diseased Gene) • Triplet Repeat • 1983 • Nancy Wexler 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Mapping of Markers • Sequence Tagged Sites (STS) • Sequences occurring only once in the human genome • Help to map locations • 52,000 STS in Humans • ~ 1 every 62,000 bases 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Cloning Techniques • Plasmid Cloning Introduced (1973) • Region of Interest duplicated by inclusion • YAC Chromosomes described (1987) • BACs introduced (1992) • 30,000 to 100,000 bases can be cloned 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Hierarchical (Clone-based) Approach • Know location of 30,000 – 100,000 bp region • Break into 500-700 bp fragments • Sequence Fragments • Assemble based on similarity • ~8-10x coverage • Current Price: $0.09 / base 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Hierarchical (clone-based) approach • generate overlapping set of clones • select a minimum tiling path • shotgun sequence each clone 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Hierarchical (clone-based) approach • MINUS • map generation requires resources, time and money • Some regions not cloned • PLUS • easier to assemble smaller pieces • less chance for assembly error 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Shotgun Sequencing Approach • Developed 1991 TIGR • Craig Venter, Hamilton Smith • Break genome into millions of pieces • Sequence each piece • Reassemble into full genomes 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Whole Genome Shotgun Approach • reads generated directly from a whole-genome library • assemble the genome all at once 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Whole Genome Shotgun Approach • MINUS • more prone to assembly error • computationally intensive • cannot effectively handle repeats • PLUS • Less overhead time up front 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Base calling and Assembly Software • PHRED and PHRAP Developed (1988) • PHRED: Base calling software • PHRAP: Assists in assembly of sequenced data 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Available Assemblers • SEQAID (Peltola et al., 1984) • CAP (Huang, 1992) • PHRAP (Green, 1994) • TIGR Assembler (Sutton et al., 1995) • AMASS (Kim et al., 1999) • CAP3 (Huang and Madan, 1999) • Celera Assembler (Myers et al., 2000) • EULER (Pevzner et al., 2001) • ARACHNE (Batzoglou et al., 2002) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
History of Genome Projects • First Genome Sequence • FX174 Phage 5,386 bp; 9 proteins (1980) • Haemophilus Influenzea Sequenced • First non-viral genome (1.8 MB) (1995) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
History of Genome Projects • Saccharomyces cereviseae sequenced • First eukaryotic genome (12.1 MB) (1996) • Caenorhabditis elegans sequence released • First animal genome 200 MB (1998) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
History of Genome Projects • Arabidopsis thaliana sequence released • First publicly available plant genome (1999) • Rough Draft of Human Genome Reported (2001) • “Finished” 2003 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Human Genome Project • Began in 1990 (US DOE – 15 years) • Identify all genes in human DNA • Determine sequence of human genome • Develop faster sequencing technologies • Develop tools for data analysis • ELSI 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Microbial Genomes • 122 Complete Genomes in CMR • http://www.tigr.org/tigr-scripts/CMR2/CMR_Content.spl 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Genomes • Fruit Fly • Mouse • Rat • Rice • Zebra fish • Puffer fish • Chicken • Dog • Frog 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Growth of GenBank • 1982: 600,000 Bases • 2002: 28.5 Billion Bases 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables • Dayhoff ATLAS Database of Proteins (1960s) • Sequence Comparison Algorithms • 1970, Needleman-Wunch (global alignment) • Protein Databank • Brookhaven PDB (1973) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables • NMR for protein structure identification (1980) • IntelliGenetics Founded • DNA and Protein sequence analysis (1980) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables • Smith-Waterman algorithm • Local sequence alignment (1981) • GenBank Database created (1982) • Genetics Computer Group Founded • GCG suite (1982) • PCR First Described (1985) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables • FASTP Algorithm • Protein database searching (1985) • SWISS-PROT • Protein Database (1986) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables • PERL Programming Language • Allows for sequence manipulation (1987) • NCBI Established (1988) • Human Genome Initiative (1988) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables • FASTA Program released (1988) • DNA and Protein sequence database searches • BLAST Program released (1990) • Allows for quick database searches • Informax Founded (1990) • Human Genome Project Begins (1990) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables • Creation and Use of ESTs Described (1991) • Incyte Pharmaceuticals Founded (1991) • TIGR Established (1992) • Shotgun sequencing methods 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables • Affymetrix founded (1993) • PRINTS protein motif database (1994) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables First Commercial Microarray chips produced (1996) • Dolly Cloned (1997) • Capillary Sequencing machines introduced (1997) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
Other Notables • Celera Genomics Formed (1998) 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville
More Detailed Histories http://www.netsci.org/Science/Bioinform/feature06.html http://www.dhgp.de/intro/history/history.html 3/17/2003 Bioinformatics: Merging Biological and Computational Skills Eric Rouchka, D.Sc. University of Louisville