180 likes | 268 Views
High Accuracy Scoring Functions for Computational Protein Structure Refinement. Michael Zhou Departments of Computer Science and Biochemistry. The Big Picture. Translation. Ala. Ser. Glu. Folding. Leu. Pro. Ser. 3D Protein Structure. Stop. Amino Acid Sequence.
E N D
High Accuracy Scoring Functions for Computational Protein Structure Refinement Michael Zhou Departments of Computer Science and Biochemistry
The Big Picture Translation Ala Ser Glu Folding Leu Pro Ser 3D Protein Structure Stop Amino Acid Sequence
Importance and Difficulties • Structure Function • Understanding function is important • Protein misfunction diseases • Function of disease organism proteins • Engineering proteins
Why Computational Prediction? • Fast and Cheap • Unfortunately It’s a Difficult Problem • Large search space – many possible conformations • Lack of good templates
My Project • Focus on Model Refinement • Making better models from existing ones • How do you know which models are better? • Answer: Scoring functions -35.0123 Protein Model Score
Approach • Combination of different types of scoring functions • Improve performance through multiple linear regression
Compactness and Hydrophobicity Based Interatomic Distance Based Multiple Linear Regression Combination Score
Training the Method • Training Set • ~40000 conformations across 127 different proteins from CASP8 • 10 fold cross validation • Independent testing and training sets
Training Set Testing Set
Measuring Performance • Measuring Model Accuracy • CαRMSD - Our “gold standard”
Measuring Performance • Measuring Scoring Accuracy • Want to be able to pick out best models
Future Directions • Generalize to beyond refinement • Training sets with models from many different generation methods • More sophisticated machine learning • SVM, HMM, Neural Networks, etc • Benchmarking with other methods in the field
Acknowledgements Samudrala Computational Biology Group • Ram Samudrala • Brady Bernard • Jeremy Horst • Gaurav Chopra • Michael Shannon • Ling-Hong Hung • Tianyun Liu • Raymond Zhang • Adrian Laurenzi • Brian Buttrick • Manish Mishra • Stewart Moughon • Thomas Wood Funding • NSF Research Opportunities for Undergraduates Grant • Mary Gates Research Scholarship • 2010 US NIH Director's Pioneer Award (DP1OD006779) and NSF CAREER Award (0448502) to Ram Samudrala Departments • Microbiology • Computer Science • Biochemistry
The Big Picture Transcription Translation DNA RNA Protein