1 / 18

Annotation of eukaryotic genomes

Annotation of eukaryotic genomes. Genomic DNA. ab initio gene prediction. transcription. Unprocessed RNA. RNA processing. Mature mRNA. Gm 3. AAAAAAA. Comparative gene prediction. translation. Nascent polypeptide. folding. Active enzyme. Functional identification. Function.

elvis-yates
Download Presentation

Annotation of eukaryotic genomes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotation of eukaryotic genomes Genomic DNA ab initio gene prediction transcription Unprocessed RNA RNA processing Mature mRNA Gm3 AAAAAAA Comparative gene prediction translation Nascent polypeptide folding Active enzyme Functional identification Function Reactant A Product B

  2. Genome analysis overview: C.elegans

  3. Gene finding: ab initio • What features of a ORF can we use? • Size - large open reading frames • DNA composition - codon usage / 3rd position codon bias • Other features: • Kozak sequence CCGCCAUGG • Ribosome binding sites • Termination signal (stops) • Splice junction boundaries

  4. Gene finding: comparative • Use knowledge of known coding sequences to identify region of genomic DNA by similarity • transcribed DNA sequence • peptide sequence • related genomic sequence

  5. Annotation of eukaryotic genomes Genomic DNA ab initio gene prediction transcription Unprocessed RNA RNA processing Mature mRNA Gm3 AAAAAAA Comparative gene prediction translation Nascent polypeptide folding Active enzyme Functional identification Function Reactant A Product B

  6. Artemis display for S.pombe cosmid

  7. Methods for searching • Pairwise alignments: matching a query sequence against a database of subject sequences • Needleman & Wunsch - global alignment • Smith-Waterman - local alignment • FastA • BLAST • Others: SSAHA, WABA • see Chapter 7 Developing Bionformatics Computer Skills

  8. BLAST - local similarity searches • BLAST (Basic Local Alignment Search Tool) is the workhorse of genome annotation due to it’s early optimisation for the UNIX platform • Underlies most of the web-based servers world-wide • Comes in many flavours: • BLASTN - DNA against DNA • BLASTX - DNA against Protein • BLASTP - Protein against Protein • TBLASTN - Protein against DNA • TBLASTX - DNA against DNA at the peptide level

  9. BLAST - results • BLAST returns high-scoring pairs (HSPs) with a score and p-value. Blast output files can be large and difficult to interpret. • Hence we need tools to make sense of the data - both to filter/process the file and to visualise the resulting multiple sequence alignments. • MSPcrunch - a post-processor for BLAST with a number of different output types. • BioPerl - modules for handling sequences and BLAST output

  10. Standard similarity searches for first-pass annotation • genomic DNA v transcript data • BLASTN / EST_GENOME • TBLASTX • genomic DNA v genomic DNA • BLASTN • TBLASTX • genomic DNA v non-redundant protein data • BLASTX

  11. Data for gene prediction • EST/mRNA - intra-species matches • TBLASTX - inter-species matches • BLASTX - intra-species matches • BLASTX - inter-species matches • Coding measures - genefinder, hexamer • Splice sites - consensus sequences

  12. Multiple Sequence alignments in ACEDB

  13. Manual review of gene predictions • Check concordance with transcript data • Check concordance with peptide similarity data • Check splice site usage (intron / exon boundaries) • Set of human appraised gene predictions. The translations of the CDS sequences are used for protein feature analysis and initial assignment (ID, function)

More Related