270 likes | 385 Views
Identification of Novel Protein Domains in Plasmodium and Leishmania Species. Terrapon N., Ghouila A., Gascuel O., Maréchal E., Laouini D., Bréhélin L. ISCB’09 Bamako, Mali. Outline. Background Protein domains Plasmodium & Leishmania species Detection by Co-Occurrence Website
E N D
Identification of Novel Protein Domains in Plasmodium and Leishmania Species Terrapon N., Ghouila A., Gascuel O., Maréchal E., Laouini D., Bréhélin L. ISCB’09 Bamako, Mali
ISCB’09, Bamako, Mali Outline • Background • Protein domains • Plasmodium & Leishmania species • Detection by Co-Occurrence • Website • Experiments
ISCB’09, Bamako, Mali Protein domains • Domains are structural and functionalsubunits of proteins • Predicting domain composition of proteins helps to predict their function • Domain families databases • Prosite, Pfam, Superfamily, SMART, etc. • Interpro domain metadatabase: gathers information from 10 different domain databases
ISCB’09, Bamako, Mali [Finn 08] • Hidden Markov Models (HMMs): powerful tool for protein domain identification • One domain one HMM: 10 340 models (v23.0) • Score reflecting sequence similarity to the model • Thresholds provided by Pfam allowing to assert domain presence
ISCB’09, Bamako, Mali [Finn 08] • Hidden Markov Models (HMMs): powerful tool for protein domain identification • One domain one HMM: 10 340 models (v23.0) • Score reflecting sequence similarity to the model • Thresholds provided by Pfam allowing to assert domain presence • Problem: in divergent sequences, some domains may be missed
ISCB’09, Bamako, Mali Divergent organisms • Plasmodium falciparum • Agent of Malaria; sequenced [Gardner02] • ~ 500 million clinical cases and ~ 2 million deaths each year • Leishmania major • Agent of Leishmaniasis; sequenced [Ivens05] • ~ 2 million clinical cases (visceral and cutaneous) and ~ 50 thousands deaths each year • Pfam domains in these organisms • Very low variety of domains types • 50% of proteins do not have any domain (Yeast: 24%)
ISCB’09, Bamako, Mali Outline • Background • Detection by Co-Occurrence • Principle • Illustration • Website • Experiments
ISCB’09, Bamako, Mali Detection by Co-Occurrence • Principle • Relax Pfam thresholds: more detections but numerous false positives • Filter procedure using domain co-occurrence • Domain co-occurrence • Domain tendency to appear with few other favorite domains • In Uniprot proteins: 20 000 domain pairs over ~12,5 millions possible pairs (1,6‰)
ISCB’09, Bamako, Mali Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D)
ISCB’09, Bamako, Mali Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D) • Given a protein sequence • Identify the known domains A C
Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D) • Given a protein sequence • Identify the known domains • Relax Pfam thresholds: potential domains A C B D
ISCB’09, Bamako, Mali Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D) • Given a protein sequence • Identify the known domains • Relax Pfam thresholds: potential domains • Check all pairs (known, potential)in the CDP list A C B D
ISCB’09, Bamako, Mali Detection by Co-Occurrence • Conditionally dependent pairs (CDP) statistically relevant co-occurrence (Uniprot) (A, C) (A, D) (C, D) • Given a protein sequence D is certified! B is not. A C D B
ISCB’09, Bamako, Mali Detection by Co-Occurrence • Different types of certification • Known Interpro domains: more reliable • Potential Pfam domains:allow to find domains in proteins where no domain is already known • Control of the error rate • Shuffling procedure • False Discovery Rate (FDR) estimation
Outline Background Detection by Co-Occurrence Website Plasmodium species: falciparum, vivax, yoelii http://www.lirmm.fr/~terrapon/codd/ Leishmania species: major, infantum, braziliensis http://www.lirmm.fr/~terrapon/leishmania/ Experiments 30/11/2009 ISCB’09, Bamako, Mali 15
Outline • Background • Detection by Co-Occurrence • Website • Experiments • Statistics • Biological analysis ISCB’09, Bamako, Mali
Certified Domains- FDR < 10% High congruency with orthologous proteins in closest species: P. vivax: 78%, P.yoelii: 64% | L. infantum: 92%, L. braziliensis: 85% ISCB’09, Bamako, Mali
L. major P. falciparum Known domains DNA binding Hydrolase activity Transferase activity DNA binding Translation initiation factor activity Predicted domains ATP-dependent 3'-5' DNA helicase activity Chromatin binding DNA replication Intracellular transport RNA binding Transcription factor activity DNA binding DNA repair Intracellular transport RNA processing Response to DNA damage stimulus Over-represented GO terms ISCB’09, Bamako, Mali
Domains of major interest • Plasmodium falciparum • Vitamin synthesis (cobalamin and folate) • Drought resistance related domain – plant kingdom specific • Leishmania major • Bacterial specific domains (bacterial transcription regulation and receptor domains) • Domains related to cell cycle regulation and invasion mechanisms ISCB’09, Bamako, Mali
Categories of predicted domains • In Leishmania major : ISCB’09, Bamako, Mali
Conclusion • Method to improve the sensitivity of Pfam domain detection • New functional annotations • Interesting results on divergent proteomes • Predictions for Plasmodiumspecies: http://www.lirmm.fr/~terrapon/codd/ • Predictions for Leishmania species: http://www.lirmm.fr/~terrapon/leishmania/ ISCB’09, Bamako, Mali
Future Works New assumptions for the understanding of: • Transcription and regulation mechanisms • Parasite invasion strategies Application to other organisms: • Arabidopsis thaliana, Saccharomyces cerevisiae: done • All sequenced organisms: in progress… Integrate whole results in a real database Improvement: combine results of closest species 30/11/2009 ISCB’09, Bamako, Mali 30
Selecting CDPs • Computing the probability to obtain as many proteins with A and B under the nullhypothesis of independency.
Most Frequent Domain Certified in Leishmania and Plasmodium species Known in 30 proteins Discovered in 37 others TPR2 Mediates PPI Cell cycle regulation transcriptional control mitochondrial and peroxisomal protein transport neurogenesis and protein folding