220 likes | 423 Views
Codon usage and gene finding with AMIGene. Stéphane CRUVEILLER April, 14th. Introduction. Gene prediction programs like GeneMark (Borodovsky and Mc Inninch, 1993) or Glimmer (Salzberg et al, 1998; Delcher et al., 1999) Only one gene model Pros
E N D
Codon usage and gene finding with AMIGene Stéphane CRUVEILLER April, 14th
Introduction Gene prediction programs like GeneMark (Borodovsky and Mc Inninch, 1993) or Glimmer (Salzberg et al, 1998; Delcher et al., 1999) Only one gene model Pros Fast to build up gene models (no preliminary analyses required) Cons Missing genes that have special features (atypical codon usage, small genes,…) AMIGene strategy uses more than one model (generally from 1 to 4 models)
Introduction Part I How gene models are built up… Part II AMIGene strategy with multiple gene models…
Previous work on Escherichia coli... Médigue et al. 1991 (J.Mol. Biol.) Codon usage analysis on 780 genes from E. coli 3 gene classes: • Class I: Intermediary and DNA Metabolism, Regulators,... • Class II: Cellular Machinery (transcription, translation),... • Class III: Atypical Genes (ex: horizontally transferred genes) Other examples: • Bacillus subtilis : 3 classes (Moszer et al., 1999) • Borrelia burgdorferi: 2 classes ( Mc Innerney, 1998)
Method... CorrespondenceAnalysis (COA) on RSCU values (Relative Synonymous Codon Usage) then clustering. Principle • - Statistical techniqueallowing the identification and the classification • the major sources of variation whithin a dataset... • Generates a system of othogonal axes and the distribution of observations • (the genes) along these axes... • - Axes are classified in decreasing order of importance (major source on axis • 1, second one on the axis 2,...). Advantages - NO modification of the information contained in the data - allows graphical representation of observations (genes) and variables (codons) in the same factorialspace.
Method... Variable used: where For codon i - obsi: observed number of codon i - expi: expected number of codon i - nSyni : number of synonymouscodons for amino acidi -naai : occurrences of the amino acidi Advantage Cancel a possible bias due to unusual amino acids frequencies.
The case of Escherichia coli K12... Class II (1046) Highly expressed genes: • -Translation machinery • -Transcription apparatus Class III (686) - Atypical genes... Class I (2513) - DNA metabolism...
The case of Escherichia coli K12... Arginine T ending codons A ending codons G/C ending codons
The case of Mycobacterium tuberculosis H37Rv... G ending codons T ending codons C ending codons A ending codons
The case of Mycobacterium tuberculosis H37Rv... Class IV (256) Class III (397) Class II (1551) Class I (1791)
Preliminary results on Cenibacterium... GC codons T codons A codons