1 / 14

What do you with a whole genome sequence?

What do you with a whole genome sequence?. Translate it into all 6 reading frames……. Identify all of the stop codons..…. And the start codons……. Can then identify all Open Reading Frames (ORFs) But are all real genes?. Three major prokayotic gene modelers:.

gayora
Download Presentation

What do you with a whole genome sequence?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What do you with a whole genome sequence?

  2. Translate it into all 6 reading frames……

  3. Identify all of the stop codons..…

  4. And the start codons…… Can then identify all Open Reading Frames (ORFs) But are all real genes?

  5. Three major prokayotic gene modelers: Generationuses predominantly 6-mer statistics to recognize coding regions; it uses a proximity rule-based start call with ATG and GTG as potential starts. Glimmer uses interpolated Markov models (IMMs) to identify the coding regions; it uses ATG, GTG, and TTG as potential starts. Critica uses blastn to produce alignments from the entire dataset and derives dicodon statistics to recognize coding sequences. It uses an SD sensor with ATG, GTG, and TTG as potential starts.

  6. Now what? BLAST genes: To assign functions based on similarity with known genes

  7. BLAST Basic Local Alignment Search Tool finds regions of local similarity between sequences >my favorite gene Atgtcgctagctagctsctagctag Database of many gene sequences GenBank is one example Answers the questions— Is there a match? And how good is it?

  8. What are the genes doing? • Function is assigned based on degree of similarity of an already characterized gene in the database • 2 potential problems with this approach Transitive catastrophe Gene A Assigned function based on mutant phenotype or biochemical characterization of protein product Gene B From genome sequence: 70% identity to gene A Gene C From genome sequence: 60% identity to gene B Gene D From genome sequence: 70% identity to gene C But--Gene D has only 20% identity to gene A!

  9. Would like to propagate function only to orthologous genes Homolog– genes sharing a common origin note: two genes are homologs or they or not no such thing as %homology or “more homologous” Two main kinds of homologs Orthologs-genes orginating from a single ancestral gene in the last common ancestor of the compared genomes Paralogs-genes related via duplication

  10. X,Y,Z are genes in the same family A, B, C are three species

  11. Two more complicated cases: Xenologs-genes orginating from a HGT of an ortholog in a distant lineage Pseudoparalogs- homologous genes that appear to paralogs in a single genome analysis but have arisen due to a combination of vertical and lateral descent

  12. How to identify orthologs: One way: Reciprocal BLAST analysis >Genome A gene1 AGTGCATGTCCC >Genome A gene 2 TGTGCGTAGTCCAAA Database: Genome B AND >Genome B gene1 GGTTTTTACA >Genome B gene 2 AAACCTCTCTGA Database: Genome A ASK: are two genes each other’s Best BLAST hit?

  13. Can be confounded by lineage specific gene loss

  14. What if there is nothing at all similar in the database? 4% 4% 2% 20% • Call it a “hypothetical” gene • If it has a match but that is to another hypothetical gene? • “conserved hypothetical” 1% 4% 1% 32% 2% 1% Conserved Hypothetical 25% Hypothetical 1% 4% DNA Replication & Repair Energy Metabolism Nucleotide Metabolism Lipid Metabolism Transcription Amino Acid Metabolism Translation Carbohydrate Metabolism Transport Cofactor Metabolism Unassigned

More Related