1 / 39

8 undergrad authors involved in PFGE mapping & transposon mutagenesis

Curiosity, the Internet, & Some Biology Knowledge = Undergraduate Contributions to Genomics Brad Goodner Hiram College. 8 undergrad authors involved in PFGE mapping & transposon mutagenesis. 11 undergrad authors involved in library construction, gap closure, and extensive annotation

Download Presentation

8 undergrad authors involved in PFGE mapping & transposon mutagenesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Curiosity, the Internet, & SomeBiology Knowledge = Undergraduate Contributions to GenomicsBrad GoodnerHiram College

  2. 8 undergrad authors involved in PFGE mapping & transposon mutagenesis 11 undergrad authors involved in library construction, gap closure, and extensive annotation >200 undergrads involved in annotation 6 undergrad authors involved in PFGE mapping, transposon mutagenesis, & extensive annotation & comparative genomics 450 undergrads from 7 institutions involved in genome annotation

  3. Why Genomics? So many questions! So many great tools! Too few researchers! As of 3/07/2012: DomainCompletedOngoing Archaea 152 214 Bacteria 2843 7969 Eucarya 173 2385 (+ 1970 metagenome samples) So much data! *** So much for me & my students to do!!!***

  4. Basics of a Genome Project The Sequence is Not the End of the Road Overlaps in Small Pieces to Form Contigs Genome 8-20X Sequencing Coverage Gap Closure Join Large Pieces into Sequenced Genome Random Pieces Shotgun Genomic Libraries Sequencing without Cloning Annotation Functional Genomics

  5. 10 kb 0 kb 20 kb Annotation Pipeline • Gene finding & operon prediction • Blast & global sequence alignments • Protein domain prediction • Protein localization prediction • Functional prediction • Functional call, linkage to experimental data, & testable hypotheses (community involvement)

  6. Why Involve Students in Genome Annotation? • Most democratic of biology subdisciplines • Sequence data now crucial to not only understanding gene/protein function, but tied to medicine, agriculture, biotechnology, and our basic understanding of evolution • In most automated genome annotations - 35% of gene annotations are wrong in some way & things get missed • The logic of bioinformatic algorithms illustrate key principles of biology

  7. Why Annotate with Students? Most of what we know comes from a relatively small subset of life’s diversity (models). To what extent do these models adequately reflect genomic diversity?

  8. Why Annotate with Students? Genomic Encyclopedia of Bacteria & Archaea (GEBA) is a massive JGI genome sequencing effort to fill in many of the missing or under-sampled branches of the Bacteria & Archaea trunks on the Tree of Life. *T.P. Curtis, W.T. Sloan, and J.W. Scannell. 2002. Estimating prokaryotic diversity and its limits. Proc Natl Acad Sci USA 99: 10494-10499.

  9. Why Annotate with Students? First 56 GEBA genomes* filled in several missing or under-sampled branches of the Bacteria trees & showed that there is a lot of genomic diversity out there to be discovered. * D. Wu, P. Hugenholtz, K. Mavromatis, et al., 2009. A phylogeny-driven genomic encyclopedia of Bacteria and Archaea. Nature 462: 1056-1060.

  10. Making a DifferenceUndergrads & Gene/Genome Annotation • Genes as phylogenetic data • Finding genes • Basic gene information • Pathway/process-driven questions • Hypothetical genes • Genome-wide questions • Comparative genomics • Metagenomics

  11. Making a DifferenceGenes as Phylogenetic Data Metagenome

  12. Making a DifferenceGenes as Phylogenetic Data

  13. Making a DifferenceGenes as Phylogenetic Data Metagenome

  14. Finding GenesMistakes are Rarer, but Still Possible Which of these ORF’s is biologically real? One? Two? More? None? But not all!(?) Similarity-based gene calls Ab initio gene calls

  15. Finding GenesSimilarity-based Gene Calls Similarity comparisons Only take those that score above a predetermined threshold (% identity, E value) Database of known genes from other organisms Similarity-based methods can miss: small ORFs that are real Novel ORFs

  16. Finding GenesAb initio Gene Calls A few known or highly probable genes Train a model on frequency of single nucleotides, dimers, trimers, … N-mers found in real genes Run putative genes through model & only take those that score high (higher probability that gene is real) Ab initio can miss: small ORFs that are real Real ORFs that came from elsewhere

  17. 10 kb 0 kb 20 kb Basic Gene InformationCorrect Start Codon? PlanctomycesL MIDKVAKDSEMIGIVDYGMGNLRSVQKGFEKVGSTAHIVSTPAEIAAAD Rhodopirellula MITIVDYQMGNLRSVQKAVERSGVEAEITSDASQIAAAE Pelobacter MIVIIDYGMGNLRSVQKGFEKVGYSARVTDDPAVVAQAD Desulfuromonas MITIIDYGMGNLRSVQKGFEKVGYTAQVTDDPRVVEKAE Blastopirellula MITIIDYQMGNLRSVQKAIEKVGHQAVISSDAQEIAQAD PlanctomycesM MITIVDYGMGNLRSVQKAFEKVGAEAEICADPDKIAKAS Heliobacterium MIAIIDYGMGNLRSVQKGLEKAGYAGFVTSDPEAVRSAP Geobacter MIAIIDYGMGNLRSVQKGFERIGFAAEVTADPARILAAE

  18. Basic Gene InformationCorrect Start Codon?

  19. Pathway/Process-driven QuestionsStudents as Agents of Discovery Example: Looking for genes encoding F0 & F1 components of ATP Synthase in the aerobic N-fixer Azotobacter vinelandii Found 2 operons At Hiram, we typically use a pathway/process-driven annotation approach that is tied to course topics (e.g., gene structure, genome organization, lateral gene transfer); it leads to a much richer annotation tied to biological knowledge for the organism

  20. ebgadBCAI The two operons have different gene orders & evolutionary histories! Bd/e1ACBag

  21. ebgadBCAI The two operons have different gene orders & evolutionary histories! What is the role of two ATP synthase operons in a highly aerobic organism that carries out the very O-sensitive process of N fixation? How common is it to find >1 ATP synthase operon? bd/e1ACBag

  22. 6 undergrad authors (5 from Hiram, 1 from SPU) participated in PFGE mapping, transposon mutagenesis, and extensive annotation • 153 undergrads acknowledged for their participation in deep genome annotation as part of courses

  23. Looked for protein domain found in the A subunit of ATP synthase (Pfam00119) • About 150 genomes (~ 7%) have > 2 copies of the A subunit gene • Examples: • almost all of genus Burkholderia have 2 different operons, one on chromosome 1 and another on chromosome 2 Pathway/Process-driven QuestionsOne Finding Leads to New Questions How common is it to find >1 ATP synthaseoperon? Pelobacter has 3 operons (1 split in 2 pieces), with 2 operons due to a clear duplication (74-100% identity) and the other clearly different - Why have this redundancy and diversity?

  24. Pathway/Process-driven QuestionsStudents Finding Holes in Annotation Looking for 10 genes of glycolysis & 3 additional genes for gluconeogenesis Agrobacterium & Chromohalobacter genomes lack FBPase There are 6 different protein families that have FBPase activity There must be a 7th way (& maybe 8th way) as well

  25. Pathway/Process-driven QuestionsStudents Finding Potential Redundancy Gene IDProtein NameKey DomainsNearby Genes of Interest glnA glutamine synthetase GS & glnL next adenylation domains Atu0193 Glutamine synthetase GS domains FAD-oxidoreductase next Atu0602 Glutamine synthetase GS domains FAD-oxidoreductase next, in operon with zwf, pgl, edd Atu1770 Glutamine synthetase type I GS & glnB upstream adenylation domains Atu2142 Glutamine synthetase GS domains amino acid permease upstream Atu2416 Glutamine synthetase type II GS domains GS translation inhibitor Atu4230 Glutamine synthetase type III GS domains gltB

  26. 10 kb 0 kb 20 kb Hypothetical GenesThe Need to Bring a Lot of Information Together % with functional % without prediction, Genome# ORFspredictionbut with similarity E. coli K12 DH10B 4126 84.8 15.2 Conexibacter woesei DSM14684 5950 74.4 25.4 Staphylococcus aureus JH9 2753 73.4 26.6 Agrobacterium tumefaciens C58 5402 64.4 34.9 Solibacter usitatus Ellin6076 7940 61.7 37.5 Vibrio cholerae O1 el tor 3835 59.6 40.2 Planctomyces limnophilus 4304 53.6 34.8 What about transmembrane domains, conserved small domains (e.g., PFAMs), etc.?

  27. GEBA Genomes Some Real Opportunities US Dept. of Energy Joint Genome Institute First 56 GEBA genomes* filled in several missing or under-sampled branches of the Bacteria tree & showed that there is a lot of genomic diversity out there to be discovered. * D. Wu, P. Hugenholtz, K. Mavromatis, et al., 2009. A phylogeny-driven genomic encyclopedia of Bacteria and Archaea. Nature 462: 1056-1060.

  28. Division Fusobacteria - found in soils & aquatic habitats, but more so inside animals - only a few have been cultured - best known is Fusobacterium from our oral cavity

  29. Genome-level QuestionsStreptobacillus moniliformis& Rat Bite Fever

  30. Genome-level QuestionsStreptobacillus moniliformis& Rat Bite Fever S. moniliformis: fastidious non-motile facultative anaerobe fermentative Rat bite fever: hemorrhagic rash fever migratory polyarthritis 1.66 Mbp 10.7 Kbp 1568 genes 1511 ORF’s

  31. Genome-level QuestionsStreptobacillus moniliformis& Rat Bite Fever S. moniliformis: fastidious non-motile facultative anaerobe fermentative 1.66 Mbp 10.7 Kbp 1568 genes 1511 ORF’s No catalase, but 1 SOD No genes for flagellar components

  32. Genome-level QuestionsStreptobacillus moniliformis& Rat Bite Fever S. moniliformis: fastidious non-motile facultative anaerobe fermentative How does it make a living? ATP ADP ETC

  33. Genome-level QuestionsStreptobacillus moniliformis& Rat Bite Fever S. moniliformis: fastidious non-motile facultative anaerobe fermentative lactate How does it make a living? ATP ADP ETC

  34. Genome-level QuestionsStreptobacillus moniliformis& Rat Bite Fever purines DNA, RNA pyrimidines uracil uracil ? fatty acids ? polar amino acids PROTEINS sugars glutamate amino acids branched-chain amino acids ClpXP dipeptides, oligopeptides

  35. Genome-level QuestionsStreptobacillus moniliformis& Rat Bite Fever Type II secretion YES Type III secretion NO Type IV secretion YES (likely conjugation) Type VI secretion NO A D H E S I O N ? 25 members of OM trimeric YadA domain protein family Rat bite fever: hemorrhagic rash fever migratory polyarthritis Connective tissue/ECM: hyaluronan chondroitin heparin sulfate collagen hyaluronate lyase polysaccharide lyases heparinase sulfatase 2 pullulanases 2 peptidase 32 collagenases 5 IgA endopeptidases 2 peptidase 32 collagenases O-sialoglycoprotein endopeptidase Phospholipase D Host defenses: (Ig’s, antimicrobial peptides, immunomodulators, etc.)

  36. From What Can You Isolate a Metagenomic DNA Sample? Almost any environmental sample you can imagine! HiramGenomicsStore.com

  37. Why Isolate Metagenomic DNA? Culturing the microbes in any environmental sample will only recover a small percentage of the organisms CULTURING Data from Annual Rev. Micro. 39: 321-46. HiramGenomicsStore.com

  38. Why Isolate Metagenomic DNA? As long as we can break open the toughest cells, then we can represent the entire sample in our isolated metagenomic DNA DNA ISOLATION HiramGenomicsStore.com

  39. MetagenomicsLots of Interesting Questions Sequence all the DNA (unbiased, but expensive) OR Isolate by PCR and sequence 1 or more conserved genes that act as measures of evolutionary history & diversity (e.g., 16S/18S rRNA gene found in all organisms) OR Use PCR to look for specific groups of microbes HiramGenomicsStore.com

More Related