1 / 34

ComPath Comparative Metabolic Pathway Analyzer

ComPath Comparative Metabolic Pathway Analyzer. Kwangmin Choi and Sun Kim School of Informatics Indiana University. Contents. Introduction System Components Current Implementation Experiment Result Future Plan. INTRODUCTION & SYSTEM COMPONENTS. Introduction.

Download Presentation

ComPath Comparative Metabolic Pathway Analyzer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University

  2. Contents • Introduction • System Components • Current Implementation • Experiment Result • Future Plan

  3. INTRODUCTION&SYSTEM COMPONENTS

  4. Introduction • ComPath is a web-based sequence analysis system built upon: • KEGG (Kyoto Encyclopedia of Genes and Genomes) • PLATCOM (A Platform for Computational Comparative Genomics)

  5. Four Databases PATHWAY  32,657 pathways generated from 262 reference pathways GENES 1,213,035 genes in 32 eukaryotes + 260 bacteria + 24 archaea LIGAND 13,387 compounds, 2,543 drugs, 11,161 glycans, 6,446 reactions BRITE 7,817 KO (KEGG Orthology) groups KEGG adopts EC enzyme classification system KEGGKyoto Encyclopedia of Genes and Genomes

  6. EC system 0/2 • An Old, but still universally accepted system by biochemists • EC system was developed long before protein sequence or structure information were available, so the system focuses on reaction, not sequence homology and structure • Many biochemists and structural biologists try to harmonize newly available chemical, sequential, and structural data with traditional understanding of enzyme function.

  7. Problems in EC system 1/2 • Inconsistency in the EC hierarchy • For each of the six top-level EC classes, subclasses and sub-subclasses may have different meanings. • e.g. EC1.* are divided by substrate type, but EC5.* by general isomerase type • Problem with Multi-functional enzymes and multiple subunits involved in a function • EC presumes only a 1:1:1 relationship between gene, protein, and reaction. • Different sequence/structure, but similar EC • Two enzymes with lower sequence identities sometimes belong to the same or very similar EC. • e.g. o-succinylbenzoate synthase across several bacteria have below the 40% sequence identity

  8. Problems in EC system 2/2 • Similar sequence/structure, but different EC • Even variation in the fourth digit of the EC number is rare above a sequence identity threshold of 40%. • However, exceptions to this rule are prevalent. • e.g. melamine deaminase and atrazine chlorohydrolase have 98% identical, but belong to different EC. • No information on sequence/structure-mechanism relationship • EC system considers only overall transformation • Similarity among sequences is strongly correlated with similarities in the level of a common (structural domain-related) partial reaction, rather than overall transformation • How to combine enzyme structure data with partial reaction data? • Research Goal • We provide a computational environment for enzyme analysis via genome comparison • And it will be built on PLATCOM system

  9. Our Research Goal • We provide a computational environment for enzyme analysis via multiple genome comparison • And it will be built on PLATCOM system

  10. Providing a platform for comparative genomics ON THE WEB Comparative analysis system for users to freely select any sets of genomes Scalable system interactively combining high-performance sequence analysis tools PLATCOM http://platcom.org/platcomA Platform for Comparative Genomics

  11. CURRENT IMPLEMENTATION

  12. ComPath http://platcom.org/compathCompatative Metabolic Pathway Analyzer • ComPath = KEGG + PLATCOM • Not just for retrieving information from Database, • but focuses on analyzing enzymes using the enzyme-genome table • Easy to use • {Optional} Upload a user sequence and/or a saved enzyme-genome table data • Select a metabolic pathway • Select any combination of genomes in KEGG • Create an enzyme-genome table • Then use the table for various enzyme sequence analysis tasks

  13. 11 categories 123 pathways Users can upload the previous Enzyme-Genome table datatype to continue analysis Screenshot: Pathway Selection

  14. Screenshot: Genome Selection • 250 genomes from KEGG database • Users can select genomes by taxonomical and alphabetical order

  15. Screenshot: Operations 1/2

  16. Screenshot: Operations 2/2

  17. Enzyme-Genome Table • An enzyme-genome table allows for tests on whether a certain enzyme in a given pathway is present or missing using sequence analysis techniques. • Information in this table can be easily saved, uploaded, transferred. • Users also can upload their sequence set, e.g.,  an entire set of predicted proteins in a newly sequenced genome, and perform annotation of the sequences in terms of KEGG pathways.

  18. Screenshot: ComPath’ Enzyme-Genome Table – INTERACTIVE!

  19. Screenshot: KEGG’s Ortholog Table – STATIC!

  20. How ComPath works:Overall Design

  21. How ComPath works:

  22. Sequence Analyses • Missing enzyme search • Pairwise (FASTA) and multiple sequence alignment (CLUSTALW), • Domain search using SCOPEC/SUPERFAMILY and PDB domains • Domain-based analysis using hidden markov models (HMM), • Contextual sequence analysis • Sequence analysis for further investigation • Phylogenetic analysis of enzymes in selected genomes, • Gibbs motif sampler. • BAG clustering • Contextual sequence analysis • Global network analysis • Physical network analysis

  23. TEST

  24. Experiments:Genomes, Queries, Pathways • Selected Genomes • B.subtilus, B.Halodurans, E.coli • H.Influenza, H.pylori, M.genitalium, Y.pestis KIM • Query genomes • M.tuberculosis • A.aeolicus • B.anthracis • Metabolic Pathways • 00010 (glycolysis+glycogenesis), • 00020 (TCA cycle)

  25. Experiments:Comparison of Sequence Analysis Methods • Four methods (abbr.) • HMMer • HMM search using the whole sequence • CSR • HMM search using common shared regions generated by BAG program • SCOPEC • Domain search using SCOP/SUPERFMAILY and PDB database • FASTA • Simple FASTA search • Cutoff • 1e-10, 1e-20, 1e-30

  26. Experiments:Overall Design

  27. Screenshot: ComPath’ Enzyme-Genome Table – INTERACTIVE!

  28. Experiment Results (e.g.)

  29. A. aeolicus

  30. B. anthracis

  31. M. tuberculosis

  32. FUTURE PLAN

  33. Future Plan:More Resources • ComPath is being extended to incorporate more resources, including • KEGG LIGAND : A composite database consisting of compound, glycan, reaction etc. • ProRule : A new database containing functional and structural information on PROSITE profiles • SFLD : Structure-Function Linkage Database • Also we are developing databases and algorithms for enzyme analysis, e.g. Classifiers using a database of enzyme-specific HMMs. • ComPath is in an early stage of system development and we solicit feedback and suggestions from biology and bioinformatics communities.

  34. Future Plan:More Algorithms and Tools • More integrative understanding on biochemical network evolution • Algorithms to handle isozyme problem • Algorithms to computationally reconstruct alternative pathways • Algorithms to combine sequence, structure, chemical reaction, and contextual information for better enzyme annotation • Etc. • ComPath is in an early stage of system development and we solicit feedback and suggestions from biology and bioinformatics communities.

More Related