1 / 74

The Genomics Era: A Vast Resource for Educators

ASMCUE 2008- “The year genomics bombarded ASMCUE” David J. Baumler Genome Center of Wisconsin dbaumler@wisc.edu. The Genomics Era: A Vast Resource for Educators. (Perna et al. Nature 2001).

Download Presentation

The Genomics Era: A Vast Resource for Educators

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASMCUE 2008- “The year genomics bombarded ASMCUE”David J. BaumlerGenome Center of Wisconsindbaumler@wisc.edu The Genomics Era: A Vast Resource for Educators (Perna et al. Nature 2001) #1) If you haven't already, download all materials in the ASMCUE2008 folder at: http://asap.ahabs.wisc.edu/~baumler/ #2) Download Progressive Mauve at http://asap.ahabs.wisc.edu/mauve/download.php

  2. Dispel a few myths -I need a supercomputer to run genome alignments -There are so many sequenced genomes, why do we need more? -How do I get students excited and to relate to genomics? -I have been teaching too long to get into genomics -the more I use computers in teaching, the more things go wrong ASMCUE 2007 data

  3. My teaching philosophy:3rd dimension of teaching • -Go beyond 2 dimensions with paper and presentations. • -For topics on genomics, you must get computers in the students hands. • Look towards the future • -Iphones, small laptops that fit in a ziplock bag, personal communication devices with fold out keyboards and magnifying screens Small laptops Examples: Eee PC, Classmate, HP mini-Note, Ideapad -one laptop per child……What about one laptop per college student? -look into laptop check out at your campus 2005-UW Madison 56% of students own a laptop Introductory Biology-”Bring your wireless-ready laptop to class day Photo by Dave Baumler of UW-Madison Introductory Biology class

  4. Projection for wireless internet in college classrooms In 2004, only about a third of classrooms provided wireless Internet ... Wireless networks now cover more than half (51.2%) of college classrooms. ... As of 1/26/2007 by the campus computing survey

  5. Today’s session overview: Introduction Module #1) Annotate a gene from a phage genome -key concepts: using ERIC database, BLAST, Interproscan, biological annotations Module #2) Conduct genome alignments of phage genomes -using Mauve to conduct whole genome alignments, familiarize yourself with Mauve Module #3) Compare genomes from 3 outbreaks of E. coli O157:H7 -identify genomic islands using Mauve & conservation of virulence factors Module #4) Compare genomes from 5 strains of Yersinia pestis -identify genomic islands, conservation of virulence factors, analyze mutations with phenotypic consequences due to insertion and/or deletion events and Single nucleotide polymorphisms (SNP’s), and paleomicrobiology Conclusion difficulty

  6. The ERIC database houses all of the available genomes of the members of family Enterobacteriaceae, all of which are thought to have descended from a common ancestor Ancestor Boxes, represent organisms with at least one genome sequenced Human Pathogens -Calymmatobacterium -Cedecea -Citrobacter -Edwardsiella -Enterobacter -Escherichia -Ewingella -Hafnia -Klebsiella -Kluyvera -Leclercia -Leminorella -Moellerella -Morganella -Plesiomonas -Proteus -Providencia -Rahnella -Salmonella -Serratia -Shigella -Tatumella -Yersinia -Yokenella Phytopathogens/ Plant-associated Insect Pathogens /Endosymbionts -Arsenophonus -Buchnera -Sodalis -Wigglesworthia -Xenorhabdus -Brenneria -Dickeya -Erwinia -Pantoea -Pectobacterium -Phlomobacter -Sacchararobacter -Samsonia Environmental/ Animals/Industrial -Alterococcus -Budvicia -Buttiauxella -Obesumbacterium -Pragia -Trabulsiella

  7. BlastP X Y BlastP Y X Reciprocal Best Blast hits Orthologs • If at least two of these criteria are met for the pair of genes in question they are typically assigned as orthologs. • Percentage identity and alignment percentage are in the typical range (see attached spreadsheet). • Local genome context, the conserved gene is part of an operon with other genes that are already considered orthologs. • Larger scale conservation of genomic context, the conserved gene is in the same general genomic context as other orthologs. • Functional conservation, the conserved gene is predicted or known to perform the same function as the potential ortholog in another genome. >60% >60%

  8. Enterobacteria cont. Generated from 180 orthologs (Nicole T. Perna unpublished data)

  9. ERIC-Enteropathogen Resource Integration Center(http://www.ericbrc.org) Genomes Tools & Annotations Genome Views and Comparisons (https://asap.ahabs.wisc.edu/asap/logon.php)

  10. Why Phage? Genomics timeline 643 Complete microbial genomes & 970 in progress Haemophilus influenza 1,709 Phage fX174 10 genes E. coli MG1655 4,200 Drosophilia melanogaster 13,000 1977 1982 1995 1996 1997 1998 2000 2001 2008 Humans ~30-40,000 E. coli EDL933 5,200 Phage l 46 genes Saccharomyces cerevisiae 6,269 Caenorhabditis elegans 19,000 Teach annotation with a phage genome

  11. Annotation step #1: Structural Annotation Example of a gene - the start codon is green and the stop codon is red The genetic code – (Courtesy of http://history.nih.gov) • Structural annotation consists of the identification of genomic elements (e.g. genes). • Open Reading Frames (ORFs) also called coding sequences (CDSs) must have a start codon and a stop codon • location of regulatory motifs (such as promoters and ribosome binding sites) • This step is typically automated using gene prediction software (Automation only finds ~50-90% of the genes)

  12. Annotation step #2 • Functional annotation: consists in attaching biological information to genomic elements. • biochemical function • involved regulation and interactions • expression • cellular location • Three examples of annotations for one gene: • Name/synonym: a short “word” used to refer to the gene (Ex. ureC) • Product: a descriptive protein name (Ex. Urease gamma subunit) • Function : Describes what the protein does (Ex. Catalyzes the hydrolysis of urea to form ammonia and carbon dioxide)

  13. Tools you will use to annotate today • #1 ERIC database: this is where you will get the sequences and record your functional annotations. • #2 BLASTP: this is a tool you will use to find similar sequences in the NCBI database of all publicly available known and predicted proteins • #3 InterproScan: this is a tool you will use to find similar sequences in a database of protein families (groups of related proteins) and domains (functionally significant subregions of proteins) Note: For background information about Interproscan and Blast, I recommend the book “Bioinformatics for Dummies”.

  14. We are going to annotate a phage genome today • What type of genes should we anticipate finding in the phage genome? • Structural components of a phage • Phage replication proteins • Machinery for integration into the host genome • Hypothetical proteins • You are going to annotate the bacteriophage 933W genome. This phage was found in the genome of E. coli O157:H7 strain EDL933. The phage genome contains the genes stx2A and stx2B that encode the shiga toxin 2 protein, that contributes to disease in humans. Animation Courtesy of Microbelibrary.org

  15. Welcome to the Enteropathogen Resource Integration Center. Using your web browser, #1) go to http://www.ericbrc.org/ #2) in the upper right portion of the screen click on login

  16. Click on log on under ERIC user accounts. Then type in the username and password (case sensitive) Session #1 username: ASMCUE / password: genome Session #2 username:ASMCUE2 / password: genomes click the log on button. Note your class has been given access to a unique version of the genome, in which you and your fellow classmates will be the only people annotating the phage genome #2 #1

  17. Click on Annotations #1 Then use the pull down bar to select bacteriophage 933W (the last one on the list), then click the OK button #2

  18. Every gene in a genome in the ERIC database has what we call a feature ID, which consists of three capitol letters a dash and seven numbers For example ABC-1234567 Your genome will have a unique 3 letter code and each gene or coding sequence (CDS) will have a unique seven digit number. Choose your gene from the list that corresponds to your birthday and type in the feature ID and click Submit On the next page, click on the link for the feature ID #1 #2

  19. Your webpage should look like this On the right are the annotations, this is where you will be adding annotations On the left there is information about your coding sequence and also some links for tools you will be using

  20. Lets split up the class Left half of the classroom, use Interproscan to add annotations, refer to slide #8 and proceed through #14 (in the students instructions for adding annotations.ppt file located in theASMCUE2008 folder at: http://asap.ahabs.wisc.edu/~baumler/) Right half of the classroom, use BlastP at NCBI to add annotations, refer to slide #15 and proceed through #21 (in the students instructions for adding annotations file.ppt file located in the ASMCUE2008 folder at: http://asap.ahabs.wisc.edu/~baumler/) -If there is no good match, it is called a hypothetical protein -add an annotation for product as hypothetical protein -use Unpublished Sequence analysis as Evidence -type in author name, email -submit to Database

  21. Once you have completed your annotations for you gene(s), you can view the genome of the phage and see how your fellow classmates are doing by clicking on Show Feature Context (GaPP)

  22. A new window will appear in a few seconds, The gene you are working on is highlighted in blue, and you are visualizing the entire Bacteriophage 933W genome, scroll over each gene (in pink) and you should see the name and the product information provided in the boxes below the genome, also double click any of the genes, and your web-browser will open the annotation page in ERIC and you can view the function annotation, evidence, etc.

  23. Learning assessment Pre and Post test #1. Within a sequenced microbial genome, identification of a gene predicted to encode a protein should contain which of the following characteristics? #2. What percentage of the protein coding genes do you think automated computer approaches applied to a newly sequenced microbial genome will find: #3. What type of biological annotation cannot be assigned to a newly sequenced gene based solely on comparisons to known protein/gene(s)? #4. In a newly sequenced microbial genome, every identified gene produces a protein that is similar to a known protein? #5. Which of these web-based resources are useful to find biological information about a gene sequence? P<0.2 P<0.02 P<0.01 Student Testimonials “I really enjoyed learning more about bacterial genetics and the tools that are available online for genomic research and gene identification. This is an area of bacteriology that I have little experience in and I think that having experience using these websites will prove valuable as my research continues.” –UW-Madison student in Bacteriology 650 “The concepts of using BLAST and Interproscan are pretty neat, and it is great that anyone can access this information, not just the insider scientists that put it together. Thank you for teaching our class how to use these tools! I doubt I would have ever learned this stuff on my own had you not taught us.” – UW-Madison student in Bacteriology 650

  24. Module #2Conduct genome alignments of phage genomes -this module is developed to teach how to use Mauve using enterobacteria phage -Phage genomes can be aligned using Mauve in a matter of minutes. -applicable as a teaching tool to decipher the mosaicism of phage genomes. -comparative studies of 30 mycobacteriophage genomes reveal new insights into the diverse architecture and insight about gene exchange (Hatfull et al. PLoS genetics et al. 2006) You could align EVERY mycobacteriophage genome using Mauve!!! -How diverse are enterobacteriophage? (the following series of slides are Mauve alignments of phage isolated from E. coli, Salmonella spp., Yersinia spp., and Shigella spp.) all alignments are also provided for further inquiry -Since we just annotated a stx2-containing phage from E. coli O157:H7, we will run alignments with 3 phage genomes

  25. Mauve: Multiple Genome Aligner • Able to identify and align collinear regions of multiple genomes even in the presence of rearrangements • Find and extend seed matches • Group into locally collinear blocks • Align intervening regions • (Darling et al. Genome Res. 2004 Jul;14(7):1394-403.)

  26. Module #2 Understanding phage, the viruses that infect microorganisms, via genome alignments I recently aligned 56 enterobacterial phage, phage genomes are an ideal training tools for teaching how to set up mauve alignments, in the ASMCUE2008 folder, in module #2 you are provided with ~50 enterobacteriaphage genome files to conduct alignments

  27. Step #1 copy the folder called 3 phage genomes for ASMCUE workshop, and paste it on the harddrive of your computer (C: drive) Step #2 from the start menu, in programs select Mauve 2.1.1 Step #3 under the File pull down select Align with progressive Mauve #4 click here to choose where to send the output file, find the folder (from Step#1), and double click on the folder This new window will appear #5 Type in a file name, and click on Save

  28. Next add the sequences to align Click on Add sequence Select the first phage genome and click on Open, then continue with the 2nd and 3rd phage genomes. Then click on Align to start the genome alignment

  29. When viewing the LCB’s, mauve displays regions that are highly conserved/identical as full color. Areas that are unique/variable to one genome appear in white, and represent unique islands

  30. Your tool bar is at the top on the left, the tools you will use are in the View pulldown, and also the buttons Returns the viewer back to home Search for features Zoom in/out, you can also hold down the ctrl button and use the arrows on the keyboard Move left or right, you will find this useful to center a region of interest in the middle of the screen prior to zooming in

  31. Other useful commands in Mauve Function Key Zoom in Ctrl+Up Zoom out Ctrl+Down Scroll Left Ctrl+Left Scroll Right Ctrl+Right Export the current view as Ctrl+E An image

  32. Module #3) Dissecting virulence of E. coli O157:H7 using genome alignments

  33. The first E. coli genome sequenced was the non-pathogenic E. coli K-12 genome MG1655 -determination of the complete E. coli sequence required almost 6 years -E. coli is the preferred model in biochemical genetics, molecular biology, and biotechnology and its genomic characterization will undoubtedly further research toward a more complete understanding of this important experimental, medical, and industrial organism (Blattner et al. Science 1997)

  34. The first pathogenic E. coli genome sequence was enterohaemorrhagic (EHEC) Escherichia coli O157:H7 strain 933 EDL -In 1982 Escherichia coli O157:H7recognized as a pathogen for human disease -Also known as EDL933 from the Michigan outbreak in 1982 from ground beef -shiga toxin producing (STEC) (Perna et al. Nature 2001)

  35. The completion of the 2ndE. coli O157:H7 (EHEC) sequence strain Sakai • -In July 1996, an outbreak of Escherichia coli O157:H7 infection occurred among schoolchildren in Sakai City, Osaka, Japan. • 8,938 schoolchildren sickened, 3 deaths • We are starting to ask-What genomic differences determine differences in virulence, epidemiology, and fatality? (Hayashi et al. DNA Res. 2001)

  36. In 2006 E. coli O157:H7 outbreakfrom bagged spinach (from CDC) -multistate outbreak 205 people sickened, 3 deaths -Produce associated outbreak strains caused higher incidence of hemolytic-uremic syndrome (HUS) (Manning et al. PNAS 2008) -genome alignments can be used to find variations

  37. Currently there are 13 E. coli O157:H7 Genomes sequenced, we will have you focus on three that are all in the Enteropathogen Resource Integration Center (ERIC) database (www.ericbrc.org) The three strains you will focus on are: Escherichia coli EDL933 (EHEC) -1982 ground beef outbreak Escherichia coli Sakai (EHEC) (also called RIMD) -1996 radish sprout outbreak Escherichia coli EC4042 (EHEC) –2006 Fresh bagged spinach outbreak

  38. In your start menu under programs go to Mauve 2.1.1, start up Mauve, notice there is a users guide in pdf form in this folder, this will contain useful information and commands to navigate Note: your computer may need to update Java, since mauve uses a Java platform for the alignment. You should see a window for Mauve appear

  39. Next double click on the 3 O157H7 folder in the ASMCUE2008 folder, it should contain the following 19 files, take the first one (3 O157 alignment), and drag and drop it into the mauve window It should start to say reading sequences here, and in a few seconds the alignment will appear, note computers with less than 512MB RAM may not be able to open the file

  40. Your alignment should look like this Organism name notice the first is EDL933, the second is RIMD(Sakai), and the third is EC4042 (spinach) Using the up or down arrows, you can switch the position of the genomes

  41. Top strand Bottom strand The colored blocks are called local colinear blocks (LCB’s), and represent regions of the genome that Mauve has identified as conserved, the lines connect the LCB’s, notice that some are in different positions in the other genomes, some are inverted and appear on the bottom strand of the double stranded genome

  42. Notice, that when you scroll (slowly) over a white region (island) the black boxes pause in the other genomes, then comes back once you have passed over the island and back into conserved regions

  43. When you move your mouse over a region of one genome it will show a black box and also show the corresponding region (boxes) in the other two genomes, try scrolling left to right on one genome

  44. If you would like to look at all three LCB’s, even though one is in a different position, scroll over one LCB and click the mouse button

  45. Lets use the zoom function, press the home button to restore the alignment to original view Now click on the white island in the top genome, and using the right button bring it to the center of the screen, now start to zoom in multiple times You will start to see the genes, scroll over one and pause, and a window will pop-up with the product annotation, so here you can view what genes are present in this EDL933 island, and not in the other two

  46. Now place you mouse over one of the genes, in my example I have iha (irgA homolog adhesion) Click your mouse once on the gene, and a window will pop-up, scroll down and select View CDS iha in ERICdb This will open the page in the ERIC database for that gene, containing all of the annotations, you can look to see if it is involved in virulence

  47. Lets use the search feature #1) Click on the search feature #2) Choose a genome (EDL933) #3) Type in a gene name (stx2A) #4) Click on search

  48. Notice that it has found the stx2A gene (highlighted in blue), and also in the RIMD strain. Just because it isn't aligned in the EC4042 strain does not mean it isn't there, if you look to the right in the EC4042 genome, you will find it Stx2A

  49. One last feature you can use in Mauve To find an island that is in 2 out of 3 strains you will use the backbone view Press the home button first Then go to the View pull down select color scheme then backbone color

  50. Your alignment should look like this in backbone color, regions in all three appear in light purple color, there will be regions that are different colors that will correspond to 2 out of 3 genomes (you may have to zoom in a bit to see these regions Regions in only EDL933 and RIMD appear olive green Regions in only EDL933 and EC4042 appear maroon Regions in only RIMD and EC4042 appear tan/brown This is how you identify islands unique to 2/3 strains

More Related