1 / 21

GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor:

GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor: Judith Klein-Seetharaman Forschungszentrum, Juelich, Germany. Genome Signatures.

tanek
Download Presentation

GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor: Judith Klein-Seetharaman Forschungszentrum, Juelich, Germany

  2. Genome Signatures • Sequence peptides which occur with unusually high frequency unlike others in particular organism or pathogen • Potential applications: • Drug development: synthetize drugs which target genome signature in pathogen • Sensor development: use genome signature to identify organism quickly using antibody

  3. Approach • Linguistic approach • N-gram analysis using toolkit • What the BLMT toolkit provides • N-gram statistical analysis • Definition of signature sequences • Use of toolkit on Neisseria Meningitidis 0.09 Neisseria meningitidis versus other species n=4 0.08 0.07 0.06 0.05 Occurrence of n-gram (%) 0.04 0.03 0.02 0.01 0 SDGI LAAL AALL LLAA ALLA AAAL LAAA ALAA AALA AVLA AAAA AVAA AAAV EAAA AEAA AAEA AAVA AAAE GRLK MPSE n-gram = sequence of length n

  4. Use of BLMT • N-gram statistical analysis gives us a detailed statistical data in terms of frequency of n-grams and their respective mean and standard deviations. • We have taken 45 organisms into consideration –bacteria, archaea, mycoplasmas and human • Search for n-grams whose standard deviations are away from the mean values. • Indicates the difference between expected and observed values in frequency of the n-grams. • Eventually helps us to see the unsusuality of this n-gram in the organism unlike the others compared.

  5. Xylella(black) Vibrio(red) Ureaplasma(green) Treponema(blue) Thermotoga(yellow) Difference Between Expected and Observed frequencies n-gram The positive values indicate the over-represented n-grams while the negative values indicate the under-represented n-grams

  6. Initial Points of difference between expected and observed frequency graph Xylella(black) Vibrio(red) Ureaplasma(green) Treponema(blue) Thermotoga(yellow) Ureapasma shows high difference values (approx 0.00021), indicating over-representation of n-grams compared to expected probability of occurence in the organism

  7. Mycoplasma genitalium(black) M.tuberculosis(red) M.leprae(green) Mesorhizobium(blue) Lactococcus(yellow) Standard deviation away from the mean • Mycoplasma genitalium(black) • M.tuberculosis(red) • M.leprae(green) • Mesorhizobium(blue) • Lactococcus(yellow) Shows distribution of n-gram standard deviations with both high and low values of difference, indicating the over-expressed and under-expressed n-gram values.

  8. Highest standard deviations away from the mean • Mycoplasma genitalium(black) • M.tuberculosis(red) • M.leprae(green) • Mesorhizobium(blue) • Lactococcus(yellow) Shows initial (highest) values of standard deviation away from mean N-grams of M.tuberculosis much higher than M.leprae.

  9. Comparison of genome size with varying standard deviations • Examine the relationship between genome size and distribution of n-gram standard deviations for each organism • Human genome taken as reference. • Compare genome size and standard deviations within same genus but across different species.

  10. Size Distribution of Genomes 1.Human 22889476 2.Bacteria_Mesorhizobium_loti 4080256 3.Bacteria_Pseudomonas_aeruginosaPA01 3730192 4.baceria E_coi0157H7Baceria_Escherichia_coiO157H7 3229098 5.Bacteria_Escherichia_coliO157H7EDL933 3228100 6.Bacteria_Escherichia_coliK12 2726558 7.Bacteria_Mycobacterium_tuberculosisH37Rv 2666338 8.Bacteria_Bacillus_subtilis 2442200 9.Bacteria_Bacillus_halodurans_C125 2384352 10.Bacteria_SynechocystisPCC6803 2072748 11.Bacteria_Vibrio_cholerae_chr1 1725852 12.Bacteria_Deinococcus_radioduransR1_chr1 1559376 13.Bacteria_Xylella_fastidiosa 1490262 14.Archaea_Archaeoglobus_fulgidus 1343990 15.Bacteria_Pasteurella_multocida 1340102 16.Bacteria_Lactococcus_lactis_subsp_lactis 1335222 17.Archaea_Aeropyrum_pernix 1280062 18.B_Neisseria_meningitidis_serogroupBstrainMC58 1178096 19.Archaea_Halobacterium_spNRC1 1178038 20.B_Neisseria_meningitidis_serogroupAstrainZ2491 1176104 21.Bacteria_thermotoga_maritima 1167344 22.Bacteria_Pyrococcus_horikoshiiOT3 1141216 23.Bacteria_Mycobacterium_leprae_strinTN 1080756 24.A_Methanobacterium_thermoautotrophicum_deltaH 1054752 25.Bacteria_Haemophilus_influenzaeRd 1045572 26.Bacteria_Campylobacter_jejuni 1020944 27.Bacteria_Helicobacter_pylori_strianJ99 990942 28.Bacteria_Helicobacter_pylori26695 986258 29.Archaea_Methanococcus_jannaschii 970558 30.Bacteriae_Aquifex_aeolicus 968068 31.Archaea_Thermoplasma_acidophilum 909164 32.Archaea_thermoplasma_volcanium 903228 33.Bacteria_Chlamydophila_pneumonieaeJ138 735350 34.Bacteria_Chlamydophila_pneumonieaCWL029 725492 35.Bacteria_Chlamydophila_pneumonieaeAR39 729896 36.Bacteria_Treponema_pallidum 703414 37.Bacteria_Chlamydia_muridarum 646712 38.Bacteria_Chlamydia_trachomatis 626142 39.Bacteria_Rickettsia_prowazekii_strain_MadridE 559828 40.Bacteria_Mycoplasma_pneumoniae 480870 41.Bacteria_Ureaplasma_urealyticum 457608 42.Bacteria_Buchnera_sp_APS 371470 43.mycoplasma genitalium 352826 44.Bacteria_Borrelia_burgdorferi 300106

  11. Size genome graph and varying std deviation values • Human(black22889476) • Mesorhizobium(red,4080256) • P.aeruginosa(green,3730192) • E_coi0157h7(blue,3229098) • E_coli0157h7EDl933 • (yellow,3228100) The organisms are listed in descending order of genome size. The relation between distribution of n-gram standard deviations and size is compared.

  12. Tail end of Genome size and n-gram distribution of standard deviations Human(black,22889476) Mesorhizobium(red,4080256) P.aeruginosa(green,3730192) E_coi0157h7(blue,3229098) E_coli0157h7EDl933 (yellow,3228100) Human genome, though largest in size, has low values of n-gram standard deviation values away from the mean compared to smaller genomes

  13. Initial points: Genome size and n-gram distribution of standard deviations Human(black,22889476) Mesorhizobium(red,4080256) P.aeruginosa(green,3730192) E_coi0157h7(blue,3229098) E_coli0157h7EDl933 (yellow,3228100) Human n-gram std deviation values are almost equal to Mesorhizobium though Mesorhizobium has much smaller genome.

  14. Genome size and n-gram distribution of standard deviations • Human (black,22889476) • E_coliK12(red,2726558) • M.tuberculosis(green,2666338) • B.subtilis(blue,2442200) • B.halodurans(yellow,2384352) • Synechocystis(brown,2072748) M.tuberculosis has very high n-gram standard deviation values. It exceeds the values of human, despite its smaller genome size.

  15. Initial points of Genome size and n-gram distribution of standard deviations Human (black,22889476) E_coliK12(red,2726558) M.tuberculosis(green,2666338) B.subtilis(blue,2442200) B.halodurans(yellow,2384352) Synechocystis(brown,2072748) The thickness of lines indicates the genome size. The thinnest line represents E_coliK12. Mycobacterium tuberculosis shows highest values.

  16. Final points of Genome size and n-gram distribution of standard deviations Human (black,22889476) E_coliK12(red,2726558) M.tuberculosis(green,2666338) B.subtilis(blue,2442200) B.halodurans(yellow,2384352) Synechocystis(brown,2072748) M.tuberculosis and all other organisms here have n-grams with higher difference values than human.

  17. Same genus / different species • 4-grams in M. tuberculosis have much higher 4-gram standard deviations from mean than M. leprae

  18. Mycobacterium M. tuberculosis M. leprae

  19. Neisseria meningitidis Thermotoga maritima Synechocystis spec. Haemophilus influenza Human Other Organisms

  20. Conclusions • n-grams which are at least 30 standard deviations away from the mean are significant candidates for genome signatures. • Difference graphs: estimate the likelihood of n-gram observed in an organism. • Genome size graphs : there is no specific relationship between the size of genome and its standard deviation values. • Same genus and different species, where genome size is specified: There is a noticeable difference observed between Mycobacterium species (M.leprae and M.tuberculosis).

  21. Current and future work • Find n-gram signatures n-grams in E.coli. • Explore the relationship between genome size and distribution of n-gram standard deviations different species of the same organism. • Find more specific targets to differentiate species in terms of signature peptides for all the 44 organisms taken for study.

More Related