680 likes | 876 Views
Michael Zimmermann Ataur Katebi Ragothaman Yennamalli. Protein Structure Lab. Structures and Bioinformatics. Detailed genetic information informs organism wide views. Structures and Bioinformatics. Today’s Plan. What are molecular structures?
E N D
CSBSI Short Course, June, 2010 Michael Zimmermann Ataur Katebi Ragothaman Yennamalli Protein Structure Lab
CSBSI Short Course, June, 2010 Structures and Bioinformatics Detailed genetic information informs organism wide views
CSBSI Short Course, June, 2010 Structures and Bioinformatics
CSBSI Short Course, June, 2010 Today’s Plan • What are molecular structures? • Primary, Secondary, Tertiary, Quaternary Structure • Why we need them • Where do we get them? • PDB, NDB, and EMDB • Homology modeling • How do they interact? • DIP and Docking • How do we know what they do? • Genome annotation (what you’ve been doing) • Molecular motions • Molecular Dynamics • Normal Mode Analysis (Elastic Networks)
CSBSI Short Course, June, 2010 What Are Molecular Structures?(and why are they important?)
CSBSI Short Course, June, 2010 Central Dogma CGACGGGGACGACGGGGACCATTT GCUGCCCCUGCUGCCCCUGGUAAA AAPAAPGK DNA → RNA → Protein
Protein secondary structure elements (1arl) • (H) -helices • (E) - sheets • (C) Coils • Molecules are too small to see • Artistic depictions are informative
CSBSI Short Course, June, 2010 Size and Scale http://learn.genetics.utah.edu/content/begin/cells/scale/
CSBSI Short Course, June, 2010 Diverse Tertiary Structures
Importance of the problem • # sequences >> # number structures • Secondary structure may be used as an input for tertiary structure prediction • 1D problem is easier than 3D
CSBSI Short Course, June, 2010 Scale of Sequence Versus Structure
CSBSI Short Course, June, 2010 How do we get them? • Databases or Structure Prediction
Assignments of secondary structure • Crystallographers assign (subjective) • Automatic assignments from the PDB coordinates • Dictionary of Secondary Structure of Proteins (DSSP) • Kabsch and Sander 1983 - based on positions of hydrogen bonds • STRIDE assignments
DSSP assignments • 1. (H) Helix • 2 (E) Strand • 3 (G) 310 Helix • 4 (I) Helix • 5 (B) Bridge (single residue strand) • 6 (T) Turn • 7 (S) Bend • 8 (C) Coil
Some ambiguity • Various translations of 8 DSSP states into 3 secondary structure states • Two versions of DSSP • EMBL (Heidelberg) version • Includes interchain hydrogen bonds • PDB version • Excludes interchain hydrogen bonds
Improvement of prediction by using multiple sequence alignments • Zvelebil et al 1987 • Levin, Pascarella, Argos & Garnier 1993 • Rost & Sander 1993 • Accuracy of prediction based on single sequences ~ 65% • Accuracy of prediction using multiple sequence alignments ~ 75% (for the most successful methods)
New improved algorithm (GOR V)Kloczkowski, Ting, Jernigan & Garnier • New database of 513 non-redundant sequences proposed by Cuff and Barton • Additional statistics of triplets • Resizable window (size of the window is adjusted to the length of the sequence) • Optimization of parameters • Decision parameters to increase the accuracy of prediction for -sheets • Multiple sequence alignments PSI-BLAST (FASTA + CLUSTAL in an early version)
GOR V • >gi|42572793|ref|NP_974493.1| myb family transcription factor [Arabidopsis thaliana] • MDNHRRTKQPKTNSIVTSSSEVSSLEWEVV • SQEEEDLVSRMHKLVGDRWELIAGRIPGRT • AGEIERFWVMKN GOR V serverhttp://gor.bb.iastate.edu/
References • A. Kloczkowski, K-L. Ting, R.L. Jernigan and J. Garnier – Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment, Polymer, 2002, 43, 441-449 • A. Kloczkowski, K-L. Ting, R.L. Jernigan and J. Garnier – Combining GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence, Proteins; Structure, Function Genetics, 2002, 49, 154-166
Other methods • PSIPRED (Neural Network) http://bioinf.cs.ucl.ac.uk/psipred/psiform.html • PHD (Neural Network) http://cubic.bioc.columbia.edu/predictprotein/ • JPRED (Neural Network) http://www.compbio.dundee.ac.uk/~www-jpred/submit.html • SAM-T99 (Hidden Markov Models) http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99-query.html • META servers http://cubic.bioc.columbia.edu/predictprotein/submit_meta.html • compare with actual structure • problem of turning into 3D structure
CSBSI Short Course, June, 2010 • Retrieving, Viewing, and Analyzing Molecular Structure Files
CSBSI Short Course, June, 2010 Where to get Molecular Files • http://www.rcsb.org/ • http://ndbserver.rutgers.edu • http://www.emdatabank.org/
CSBSI Short Course, June, 2010 Molecule Files • The Protein DataBank (PDB) file 1T3R ATOM 8 N GLN A 2 25.279 22.419 34.914 1.00 21.01 N ATOM 9 CA GLN A 2 23.872 22.620 34.516 1.00 17.82 C ATOM 10 C GLN A 2 23.654 24.078 34.247 1.00 18.11 C ATOM 11 O GLN A 2 23.996 24.956 35.114 1.00 20.40 O ATOM 12 CB GLN A 2 22.926 22.138 35.611 1.00 19.10 C ATOM 13 CG GLN A 2 21.447 22.401 35.328 1.00 18.52 C ATOM 14 CD GLN A 2 20.558 21.549 36.121 1.00 21.32 C ATOM 15 OE1 GLN A 2 20.145 20.502 35.662 1.00 22.49 O ATOM 16 NE2 GLN A 2 20.336 21.926 37.380 1.00 21.05 N AtomType ChainID X Y Z B-Factor Atom# Residue Residue# Element
sdf mol2 MOL2 – SYBYL Tripos format SMILES convert to 3D with CORINA
CSBSI Short Course, June, 2010 Molecular Visualization UIUC UCSF Delano Scientific and Schrödinger
CSBSI Short Course, June, 2010 • Homology • Modeling
CSBSI Short Course, June, 2010 Homology Modeling • Use when sequence identity is > 35% • 1233 known topologies (CATH) • ≈70% of protein sequences (~50,000,000) • template selection • sequence-to-structure alignment • model building • model selection and refinement
CSBSI Short Course, June, 2010 Protein Machines • Most of biochemical processes taking placein vivo are controlled by proteins: • gene expression and regulation (nuclear receptors) • metabolic pathways (enzymes) • immune system (antibodies) • signal transduction (trans-membrane receptors) • structural (collagen) • Fully automated • Highly specific
CSBSI Short Course, June, 2010 Classical Structure Determination • Proteins’ structures are solved mostly by: • x-ray crystallography (or SAXS) • NMR spectroscopy • Cryo-EM • All methods require a lot of human input from highly trained specialists. • time-consuming • $10,000 - $1,000,000 for one structure.
CSBSI Short Course, June, 2010 Homology Modeling
CSBSI Short Course, June, 2010 Template Detection • Sequence-only methods: • Blast, Fasta scan against PDB database. • PSI-Blast scan against sequence database. • Profile comparison: • Profile-to-profile alignment on structural database. • Threading: • Optimal fitting of modeled sequence to structures from PDB. • Metaservers: • Combination of all above (and others).
CSBSI Short Course, June, 2010 Modeling • Template is used as a rigid scaffold. • Modeling algorithm rebuilds missing parts (loops) • Template is used as a semi-flexible scaffold. • Usually a great number of models are generated • Modeller (A. Sali), Rosetta (D. Baker),CABS (A. Kolinski), UnRes (H. Scheraga), I-TASSER (Y. Zhang)
CSBSI Short Course, June, 2010 Homology Modeling Example See “Homology Modeling.pdf”
CSBSI Short Course, June, 2010 How do they interact? • DIP: http://dip.doe-mbi.ucla.edu/dip/Main.cgi
An Introduction to Docking
Outline • Introduction to DOCKING • Protein-protein docking • Protein-ligand docking • Protein-ligand Docking – “Hands -on”
What is docking Prediction of the optimal physical configuration and energy between two molecules The docking problem optimizes: • 1. Finds orientation that maximizes the interaction. • 2. Searches for minimum energy conformation • 3. Predicts structural rearrangement
Why docking? • Predicting Biomolecular interactions • Computer aided analysis is time saving • Automated prediction of molecular interactions is the key to rational drug design • Measuring the relative strength of interactions in a cluster of interacting proteins • Drug design: Virtual Screening • Drug molecule database growth
Different types of docking • Protein-protein docking: • Two proteins – aprox. the same • size • Protein-ligand docking • A large molecule (the receptor) • and a small molecule (the ligand)
Rigid body and flexible docking • Rigid body docking: • bond angles, bond lengths, and • torsion angles of the components • are not modified • Flexible Docking: • Permits conformational change
Scoring function • Van der Waals • A/(r6) - B/(r12) where A and B are constants and r is the distance between them • H-bond: • occurs when one molecule has a Hydrogen atom close to the docking surface that interacts with an atom from the second molecule when the docking occurs • Electrostatics • The most significant force that draws parts of the molecules closer together or further apart according to their electrical charge.
Protein-Protein Docking Examples • Based on last CAPRI (Critical Assessment of Predicted Interactions) performances: • Zdock • Cluspro • Autodock • RosettaDock • PatchDock • HADDOCK
Protein-Ligand Docking Examples • DOCK • Autodock • MOE-Dock • GOLD • FlexX • Glide • Hammerhead • FLOG