1 / 21

Codon usage and gene finding with AMIGene

Codon usage and gene finding with AMIGene. Stéphane CRUVEILLER April, 14th. Introduction. Gene prediction programs like GeneMark (Borodovsky and Mc Inninch, 1993) or Glimmer (Salzberg et al, 1998; Delcher et al., 1999)  Only one gene model Pros

brigit
Download Presentation

Codon usage and gene finding with AMIGene

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Codon usage and gene finding with AMIGene Stéphane CRUVEILLER April, 14th

  2. Introduction Gene prediction programs like GeneMark (Borodovsky and Mc Inninch, 1993) or Glimmer (Salzberg et al, 1998; Delcher et al., 1999)  Only one gene model Pros Fast to build up gene models (no preliminary analyses required) Cons Missing genes that have special features (atypical codon usage, small genes,…)  AMIGene strategy uses more than one model (generally from 1 to 4 models)

  3. Introduction Part I How gene models are built up… Part II AMIGene strategy with multiple gene models…

  4. Previous work on Escherichia coli... Médigue et al. 1991 (J.Mol. Biol.) Codon usage analysis on 780 genes from E. coli  3 gene classes: • Class I: Intermediary and DNA Metabolism, Regulators,... • Class II: Cellular Machinery (transcription, translation),... • Class III: Atypical Genes (ex: horizontally transferred genes) Other examples: • Bacillus subtilis : 3 classes (Moszer et al., 1999) • Borrelia burgdorferi: 2 classes ( Mc Innerney, 1998)

  5. Method... CorrespondenceAnalysis (COA) on RSCU values (Relative Synonymous Codon Usage) then clustering. Principle • - Statistical techniqueallowing the identification and the classification • the major sources of variation whithin a dataset... • Generates a system of othogonal axes and the distribution of observations • (the genes) along these axes... • - Axes are classified in decreasing order of importance (major source on axis • 1, second one on the axis 2,...). Advantages - NO modification of the information contained in the data - allows graphical representation of observations (genes) and variables (codons) in the same factorialspace.

  6. Method... Variable used: where For codon i - obsi: observed number of codon i - expi: expected number of codon i - nSyni : number of synonymouscodons for amino acidi -naai : occurrences of the amino acidi Advantage Cancel a possible bias due to unusual amino acids frequencies.

  7. GenoBool (http://www.genostar.org)

  8. The case of Escherichia coli K12... Class II (1046) Highly expressed genes: • -Translation machinery • -Transcription apparatus Class III (686) - Atypical genes... Class I (2513) - DNA metabolism...

  9. The case of Escherichia coli K12... Arginine T ending codons A ending codons G/C ending codons

  10. The case of Mycobacterium tuberculosis H37Rv... G ending codons T ending codons C ending codons A ending codons

  11. The case of Mycobacterium tuberculosis H37Rv... Class IV (256) Class III (397) Class II (1551) Class I (1791)

  12. Preliminary results on Cenibacterium... GC codons T codons A codons

  13. Preliminary results on Cenibacterium...

  14. Preliminary results on Cenibacterium...

  15. Preliminary results on Cenibacterium...

  16. Preliminary results on Cenibacterium...

  17. Preliminary results on Cenibacterium...

  18. Preliminary results on Cenibacterium...

More Related