460 likes | 605 Views
Annotation Presentation Week 3. Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only). Phylogenetic tree of Bacteria. Insert Figure 1 from Handelsman (2004) Microbiol. Mol. Biol. Rev . 68 : 669-685.
E N D
Annotation Presentation Week 3 Sequence-based Similarity Module (BLAST & CDD only )& Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Phylogenetic tree of Bacteria Insert Figure 1 from Handelsman (2004) Microbiol. Mol. Biol. Rev. 68: 669-685. • Recall: Planctomycetes are one of the GEBA genomes, representing an under-represented phylum within domain Bacteria GEBA: Genomic Encyclopedia of Bacteria & Archaea
Recent phylogenetic analysis using 23S rRNA gene supports the monophyletic grouping and branch order for these four bacterial phyla Insert Figure 4A from Pilhofer et al. (2008) Characterization and Evolution of Cell Division and Cell Wall Synthesis Genes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae, and Planctomycetes and Phylogenetic Comparison with rRNA Genes. J Bacteriology190: 3192-3202.
Members of the Planctomycetaceae Family http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=126
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between two sequences. • Conserved Domain Database Search (CDD) finds sequence similarity with genes in conserved orthologous groups (COGs).
Verifying Function Based onSequence Conservation • Different types of BLAST searches • blastp • blastn • blastx • tblastn • tblastx >35% identity to experimentally characterized protein (especially in conserved regions) can be considered good evidence for function E-value less than 10-3 is significant equal to or less than 10-15 may indicate good match http://www.ncbi.nlm.nih.gov/ Beware!!! Mindless BLAST – Similarity score and E-value do not tell whole story! Must also consider length of match (query coverage) & biological function (organismal context) Be cautious of auto-annotated gene function – GenBank not a curated database
Follow this link from the lab notebook BLAST: Altschul et al. (1997) Nucleic Acids Research25: 3389-2402. Genbank: Benson et al. (2006) Nucleic Acids Research35: D21 – D25.
Retrieve query sequence from first module in imgACT Lab Notebook
Copy amino acid sequence in FASTA format from in imgACT Lab Notebook
Paste query sequence into box “Click”
WHAT YOU SHOULD SEE. . . BLAST RESULTS Scroll down
Accession ID Top significant hit • Start with first hit. . . • Click on Accession ID
NOTE: Top hit is from class organism; Do not include results in P. limnophilus in lab notebook
Accession ID Next significant hit • Click on Accession ID
Copy/paste this information into imgACT notebook NOTE: Function assigned by automatic Gene Caller(not experimentally verified)
Reminder: Make sure you are in EDIT mode when making changes to imgACT notebook and SAVE your work along the way Return to BLAST results for this information
Sequence length of database hit (not alignment length) Pair-wise alignmentwith statistics(including E-value) • Copy/paste into imgACT notebook: • Length of alignment • Score • Expect (E-value) • Identities • Positives • Gaps • Pair-wise alignment between “Query” and “Sbjct” sequences.
NOTE: You need to modify your notebook for requested info (statistics include E-value) 725 • REPEAT procedure with second BLAST hit.
“Click” on Accession ID “Click” on Bit score Copy/paste requested information in lab notebook 733
CDD:Conserved Domain Database COG 1 – ion transport COG 2 – energy production COG 3 – cell division etc. COG genes have sequence similarity & functional conservation Bi-directional best hit in curated database Figure from Sanders-Lorenz and Miller (2010)
Return to top of BLAST Results page CDD: Marchler-Bauer et al. (2006) Nucleic Acids Research35: D237-D240.
If there are no hits, write “no significant hits” in notebookIf there are hits, scroll down & click the + sign next to the top hit Click here
Copy top COG hit and COG name into notebook Modify BOX to include length, bit score, and E-value COG description COG hit COG name Length, bit score, and E-value
Change headings and enter COG information as shown for top hit • If obtain more than one significant hit, record this info for at least the top 2 hits • Hint: Look at Score & E-value
Retrieve from Gene Detail page
How do I return to the Gene Detail page for my proposed gene? “Click” on URL saved for your geneduring first module (week 2)
Then what? Keep the Gene Detail page open in separate tab while working on imgACT Lab Notebook modules Scroll down
“Click” here on Gene Detail page
Note the red arrow corresponds to your gene • Plus strand genes on top (right to left) • Minus strand genes on bottom (right to left) Is your gene a stand alone ORF or is it clustered with other geneson same DNA strand and in same orientation? • Could be evidence that your gene is part of an operon • What are the functions of adjacent genes? Do they have related function? How conserved is the gene neighborhood? • Are there similar patterns in other organisms that contain a gene from same orthologous group? • If considerably different, may be evidence for HGT
Need to save individual panels as JPEG or PNG files. Include P. limnophilus as well as 4-5 different organisms in imgACT notebook.
“Click” here to insert images into notebook Delete ‘gene neighborhood images’ and place cursor in the box
1- Click “Browse” to find image file. 2- Press “Attach” button. Thumbnail image should appear in window. 3- Repeat for each individual neighborhood panel until all are loaded in the window prompt.
4- Next, select one image at a time and press [OK] to insert them into imgACT notebook at cursor position. NOTE: The images should be inserted in same order that the organisms were listed in img/edu Insert next image
Results: Ortholog Neighborhood Scroll down
Enter comments about homology & context: Is your gene a stand alone ORF or is it clustered with other genes or same DNA strand and in same orientation? • Could be evidence that your gene is part of an operon • What are the functions of adjacent genes? Do they have related function? How conserved is the gene neighborhood? • Are there similar patterns in other organisms that contain a gene from same orthologous group? • If considerably different, may be evidence for HGT
Retrieve from Organism Details page Retrieve from Gene Detail page
On Gene Detail page, you will find the GC content for your gene.
To find GC content for the entire P. limnophilus genome, select “Find Genomes” tab from the Gene Detail page.
Search for Planctomyces limnophilus and click on the corresponding hyperlink.
WHAT YOU SHOULD SEE. . . Scroll down
NOTE: A gene with a GC content that is more than a few percentage points above or below the the average GC content in the genome may have originated from another organism by HGT. Add acomment box & make note of this if your genemeets this criterion.