140 likes | 150 Views
This paper introduces MULTICOM, a structure prediction pipeline that combines various methods to predict protein structure. The pipeline includes template identification, multi-template combination, model generation, model evaluation, and multi-model combination. The pipeline incorporates tools such as PSI-BLAST, HHSearch, FOLDpro, SPEM, Modeller, and Rosetta. It aims to improve template-free modeling and achieve better accuracy in protein structure prediction.
E N D
MULTICOM – A Combination Pipeline for Protein Structure Prediction Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri, Columbia, MO, USA
MULTICOM Structure Prediction Pipeline Server Predictor Query Sequence 1. Template Identification 2. Multi-Template Combination Human Predictor All CASP8 Server Models 3. Model Generation 4. Model Evaluation 5. Multi-Model Combination Output
MULTICOM Structure Prediction Pipeline Query Sequence • PSI-BLAST • HHSearch • COMPASS • FOLDpro + SPEM 1. Template Identification 2. Multi-Template Combination Query-template alignments: 3. Model Generation 4. Model Evaluation 5. Multi-Model Combination Find a set of good templates / fragments; generate alternative query-template alignments Output
MULTICOM Structure Prediction Pipeline Query Sequence 1. Template Identification Combination 1. Combine top ranked query- template alignment (QTA) with other significant QTAs 2. Take fragments from less significant QTA (Template-free) 2. Multi-Template Combination 3. Model Generation 4. Model Evaluation Don’t try to find the best template; Instead combine multiple good templates / fragments. 5. Multi-Model Combination Output
MULTICOM Structure Prediction Pipeline Query Sequence 1. Template Identification Integrative Model Generation • Modeller • Rosetta for template-free • small domains 2. Multi-Template Combination 3. Model Generation 4. Model Evaluation Domain-level combination of template-based and template-freeapproaches 5. Multi-Model Combination Output
MULTICOM Structure Prediction Pipeline Query Sequence 1. Template Identification Model Ranking by ModelEvaluator 2. Multi-Template Combination 3. Model Generation 4. Model Evaluation 5. Multi-Model Combination Output
ModelEvaluator Ab initio Sequence-Based Structural Feature Prediction 3D Model Secondary Structure Comparison EEEECCEEEHHHHHHHHHHHHEEEECCEEEHHHH Relative Solvent Accessibility eeee-----eeeee----------eeeee------eeeee---eeeeeeee Contact Map Beta-Sheet Pairing Good models ranked at the top. Very effective for template-free models. Input Features Predicted GDT-TS score
MULTICOM Structure Prediction Pipeline Query Sequence • Start from a top ranked model • Combine it with other models • having global similarity (80%, 4Å) • 3. Combine it with the longest similar model fragments 1. Template Identification Global-Local Model Combination 2. Multi-Template Combination Modeller Iterative Modeling 3. Model Generation Average Model 4. Model Evaluation Don’t try to find the best model. Instead combine multiple good models / fragments (2-3% improvement). 5. Multi-Model Combination Output
Good Template-Free Example: T0416_2 Structure MULTICOM (GDT = 0.66, RMSD = 2.5) Combination of 20 models: Zhang-Server Robetta TASSER MULTICOM YASARA forecast Success: rank very good models at top. Superposition (red: model) (Courtesy by Prof. Joel Sussman)
Good Template-Free Example: T0513_2 Structure MULTICOM (GDT = 0.73, RMSD=2.1) Combine Robetta models Better than each one of them Success: rank very good models at top and combination improves modeling. Superposition (blue: model)
Not Good Template-Free Example: T0405_1 Structure (Helix Bundle) MULTICOM GDT = 0.41 Superposition (by Prof. Sussman) (Gray: structure, yellow: best model green: MULTICOM model) Failure: ModelEvaluator fails to identify correct helix orientations.
Concluding Remarks • CASP Community can sometime generate good template-free models (e.g. Rosetta-based tools) • ModelEvaluator can rank good template-free models at the top • Iterative global-local combination of models can improve template-free modeling • Blending of template-free and template-based modeling
Blending of Template-Free and Template-Based Modeling 100% TBM 50% TBM+50%FM 100% FM Protein Modeling Spectrum
Acknowledgements • CASP8 organizers and assessors • CASP8 participants • MU colleagues: Dong Xu, Toni Kazic • My group: Zheng Wang Allison Tegge Xin Deng