330 likes | 975 Views
Microbial genomics. Genomics: study of entire genomes Logical next step after genetics: study of genes. Genomics: 1) “Structural genomics†* Determine and annotate DNA sequence of entire genome * Determining crystal structure for all predicted proteins 2) Functional genomics
E N D
Microbial genomics Genomics: study of entire genomes Logical next step after genetics: study of genes Genomics: 1) “Structural genomics” * Determine and annotate DNA sequence of entire genome * Determining crystal structure for all predicted proteins 2) Functional genomics * What genes are expressed when? DNA microarrays 3) Comparative genomics * Compare and contrast metabolisms, evolution, gene transfer
Sequencing a genome Whole-genome shotgun sequencing (Venter & Smith): * Shear DNA into random fragments of appropriate length * Clone many fragments into plasmid: library * Determine DNA sequence many of the inserts * Chain terminator approach, fluorescently labeled nucleotides, capillary electrophoresis of DNA. * Assemble sequences into long “contigs”, then entire genome * Hundreds of bacterial genomes have been sequenced, rapidly growing * Next 2 years: 1,000-fold increase in DNA sequencing capacity (for the same price). A bacterial genome for $100, instead of $100,000
How to sequence a DNA fragment? 1) In vitro DNA replication using DNA polymerase. 2) Use small % of chain-stopper nucleotide derivatives that lack a 3’ OH group: incorporation stops further growth of DNA chain Frederic Sanger: 2 Noble prizes. For protein sequencing and for DNA sequencing
Use 4 chain-stopper nucelotides Recent advance: use 4 fluorescent colors for the 4 chainstopper nucleotides
A typical bacterial genome * 1-5 million base pairs (Mbp) * ~500-5000 different proteins encoded New questions: * What is the minimal set of proteins that can sustain bacterial life? * Can an artificial genome be assembled to yield a novel bacterium? (“Frankencell”)
Genome annotation * Identify open reading frames (ORFs) * Identify ORFS that have been observed before in other organisms (databases) * If amino acid sequence very (?) similar, and the related protein has a known function: gene successfully annotated. * Problem: ~50% of all genes are of unknown function; many “conserved hypothetical proteins” * Methods to analyze various other features of the genome: * Operons, promotors, rbs, transcription terminators Operons often contain valuable information: analysis of flanking genes * tRNA and rRNA genes * Domain structures of proteins
How to use a genome sequence? * How many genes are used for which functions?
How to use a genome sequence? * Metabolic reconstruction
How to use a genome sequence? * Mass spectrometry is very efficient link between (complex) protein sample and genomic information LC-MS can identify the mass of many proteins in a complex protein sample, and also partial amino acid sequence information (by fragmenting a selected protein peak). By combining this information with the predicted molecular weights of all predicted proteins in the genome: from protein sample to knowledge of the sequence of the protein, and which gene encodes it.
DNA microarrays * Chemically synthesize one (or more) DNA fragments corresponding to sequence of each predicted gene: few thousand DNA fragments. * Put a very small droplet of each DNA fragment on a glass slide, “dry and bake” to attach DNA * Purify RNA from cells (for example grown under two different conditions), then copy RNA into DNA using reverse transcriptase (cDNA) using fluorescent nucleotide analogs. * Hybridize cDNA to glass slide and use microscope to see which gene is copied into RNA (fluorescence)
Environmental genomics DNA sequencing of uncultured organisms. An example: Beja et al., 2000. Evidence for a new type of phototrophy in the sea. Science 289: 1902-1906. “Sequence analysis of a 130-kb genomic fragment that encoded the ribosomal RNA (rRNA) operon from an uncultivated member of the marine -Proteobacteria (that is, the "SAR86" group) also revealed an open reading frame (ORF) encoding a putative rhodopsin (referred to here as proteorhodopsin).”
Beja et al., 2000. Evidence for a new type of phototrophy in the sea. Science 289: 1902-1906. “Extremely halophilic archaea contain retinal-binding integral membrane proteins called bacteriorhodopsins that function as light-driven proton pumps. So far, bacteriorhodopsins capable of generating a chemiosmotic membrane potential in response to light have been demonstrated only in halophilic archaea. We describe here a type of rhodopsin derived from bacteria that was discovered through genomic analyses of naturally occuring marine bacterioplankton. The bacterial rhodopsin was encoded in the genome of an uncultivated gamma-proteobacterium and shared highest amino acid sequence similarity with archaeal rhodopsins. The protein was functionally expressed in Escherichia coli and bound retinal to form an active, light-driven proton pump.”
Archaea Sensory rhodopsin ↓ Methyl- accepting chemotaxis protein ↓ CheA ↓ CheY ↓ flagellar switch ↓ Change in swimming Rhodopsins in Archaea Bacteriorhodopsin: light-driven proton pump in Halobacterium salinarum
Bacteriorhodopsin Photosynthetic reaction center: Photosynthesis based on light-driven electron transfer from (bacterio)chlorophyll Bacteriorhodopsin/proteorhodopsin: Photosynthesis based on photoisomerization of retinal
Beja et al., 2001. Proteorhodopsin phototrophy in the ocean . Nature 411: 786-789. “Here we report that photoactive proteorhodopsin is present in oceanic surface waters. We also provide evidence of an extensive family of globally distributed proteorhodopsin variants. The protein pigments comprising this rhodopsin family seem to be spectrally tuned to different habitats-absorbing light at different wavelengths in accordance with light available in the environment. Together, our data suggest that proteorhodopsin-based phototrophy is a globally significant oceanic microbial process.”
Metagenomics Sequencing of many genomes simultaneously using environmental DNA samples Unsolved challenge: assembling data into distinct genomes Example: Venter et al., 2004. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304: 66-74.
Venter et al., 2004. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304: 66-74. “We have applied "whole-genome shotgun sequencing" to microbial populations collected en masse on tangential flow and impact filters from seawater samples collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs of nonredundant sequence was generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance of the organisms within these environmental samples. These data are estimated to derive from at least 1800 genomic species based on sequence relatedness, including 148 previously unknown bacterial phylotypes. We have identified over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors.”
Summary on proteorhodopsin In ~5 years bacteriorhodopsin changed from a model system for proton pumping in halophilic archaea to a protein (proteorhodopsin) that contribute significantly to global photosynthetic activity in marine proteobacteria.