1.24k likes | 1.55k Views
Theoretical and computational methods for the study of biological macromolecules Rachid C. Maroun, PhD Unité de Bioinformatique Structurale Institut Pasteur Paris, FRANCE. PROGRAM. INTRODUCTION PROTEIN STRUCTURE PROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN CONFORMATION
E N D
Theoretical and computational methods for the study of biological macromoleculesRachid C. Maroun, PhDUnité de Bioinformatique StructuraleInstitut PasteurParis, FRANCE
PROGRAM • INTRODUCTION • PROTEIN STRUCTURE • PROTEIN CONFORMATION • FORCES THAT STABILIZE PROTEIN CONFORMATION • METHODOLOGICAL APPROACHES • APPLICATIONS
INTRODUCTION • Biological molecules are complex systems. • The true conformation is a network of loops, twists and folds, all stacked together in a well-defined dynamical 3D structure. It’s this morphology, in general that gives the protein its “life” and that determines its activity, its role. • The determination of the 3D structure of biological molecules and the way in which this structure is linked to the function and to the sequence is of fundamental importance.
BIOLOGICAL MACROMOLECULES Diversity DNA Proteins
Carbohydrates Di- and polysaccharides a-L-galactopyranose, a monosaccharide RNA BRANCHPOINT HELIX FROM YEAST AND BINDING SITE FOR PHAGE GA/MS2 COAT PROTEINS
Important for biological function: • The complexes formed with • ligands (substrates cofactors, inhibitors, drugs, receptor agonists and antagonists) • other macromolecules
Growth of databanks Nucleotide sequences 3D protein structures Growth of PDB
Thus, theoretical methods aim at: • Generating reliable 3D models in a reasonable amount of time • Avoiding an exhaustive experimental determination of the structures of all sequences, e.g. Structural Genomics
Relationships between the sequence, and 3D structure spaces Séquences 3D Structures < 35% identité Topology (fold family)* > 35% identité 1CEM00 (Cellulase) Homologous superfamily* > 35% identité *Classification CATH
Relationships between the sequence, 3D structure and function spaces Séquences Structures < 35% identité Topology (fold family)* > 35% identité 1CEM00 (Cellulase) Homologous superfamily* > 35% identité Gtransférase Architecture* Alpha/alpha barrel Fonctions *Classification CATH
Some conclusions • The space of structures is finite and much smaller than that of sequences and functions. • Evolutionary, the 3D structure is more conserved than the sequence.
PROGRAM • INTRODUCTION • PROTEIN STRUCTURE • PROTEIN CONFORMATION • FORCES THAT STABILIZE PROTEIN CONFORMATION • METHODOLOGICAL APPROACHES • APPLICATIONS
Structure levels of a polypeptide chain • Primary, i.e. the sequence • MELIRVLANLLILQLSYAQKSSELVFGGDECNINEHRSLVVLFNSNGFLCGGTLINQDWVVTAAHCDSNNF…
Methionine (M) (hydrophobic) R : Valine (V) (aliphatic hydrophobic) Tyrosine (Y) (aromatic) Asparagine (N) (hydrophilic) Lysine (K) (+ charge) Arginine, R (+ charge) Aspartate D (- charge, hydrophilic) Histidine (H) Composition of proteins-The amino acid residues-Some representative types of the 20 naturally-ocurring amino acids Properties of amino acid residues: http://www.imb-jena.de/IMAGE_AA.html
Secondary structure The main chain N-Ca and Ca-C bonds are free to rotate. These rotations are represented by the torsion angles f and y respectively.
Right-handed a-helix Ideal values f = -57.8° • = -47.0° The peptide bond is planar: w = 180° trans w = 0° cis
Properties • Pitch p = 5.4 Å • N = 3.6 amino acid residues / turn • Rise r = 1.5 Å / residue • Backbone radius = 2.3 Å
H-bond between C=Oi and NHi+4 • Peptide planes are roughly parallel with the helix axis. Each peptide unit has a dipole moment. • Thus, the dipoles within the helix are aligned giving rise to a macrodipole moment. • Side chains point outward from helix axis. • Formation of the helix is cooperative.
The b-strand Ideal values f = -139.0° y = 135.0°
Properties • Pitch p = 6.8 Å • N = 2.0 amino acid residues / turn • Rise r = 3.4 Å / residue
loop anchor anchor v3 v1 v2 Secondary structures Loops
Peptide fragments that connect regular secondary structure elements (a-helices or b-strands) • Furnish the directional changes necessary to obtain a globular form • Found often at the surface of globular proteins • Form hydrogen bonds with water • Are in general very flexible • Other loops have specific non-repetitive, stable and ordered structures • Have a length of 2-16 residues
loop anchor anchor v3 v1 v2 Secondary structures Classification of the structure of protein loops • Type: AR beta-beta link • Type: EH beta-alpha • Type: HA beta-beta hairpin • Type: HE alpha-beta • Type: HH alpha-alpha http://sbi.imim.es/cgi-bin/archdb/loops.pl?
b-turns (reverse turns) A special case of loops with < 6 residues. Ideal values: b-turn I II Fi+1 -60° -60° Yi+1 -30° 120° Fi+2 -90° 80° Yi+2 0° 0°
Secondary structures of globular proteins • Occurrence (%): • simple loops 21 • reverse turns 15 • complex loops 10 • helices 26 • b-sheets 19 • Average length (residues ): • helices 9.3 • b-sheets 5.3 • loops 5.9
Tertiary structure • Array of secondary structures => tertiary structure (the fold) • Relative positioning of the secondary structures • Interactions that stabilize the new level of structure • Covalent bonds • S-S bridges • Non-covalent bonds • Hydrogen bonds • Salt (ionic) bridges • Hydrophobic effect • Folding is cooperative
Effects of the tertiary structure • Induction of a given secondary structure • New spatial repartition of the residues • Solvent-exposed • Buried • New functionality
Example: sperm whale myoglobin The protein is complexed to protoporphyrin IX containing Fe
Hierarchical classification of protein tertiary structures CATH: www.biochem.ucl.ac.uk/bsm/cath/ SCOP: scop.mrc-lmb.cam.ac.uk/scop/ DALI: www.ebi.ac.uk/dali/
Quaternary structure Assemblage of tertiary structures to produce a higher level of structure Quaternary structure
Some properties of quaternary structure • Quaternary structure • Protein polymers • Closed aggregates or oligomers • Homo • Hetero • Symmetry • Chemistry • Stability • Covalent bonds • Non-covalent bonds • Cooperativity • Structural and functional regulation • Allostery, e.g. the oxy to deoxy transition of the 4-mer of hemoglobin • Chemical or biological activity • No consequences = > monomer as active as the oligomer • New activity, absent in the absence of oligomerization, e.g. the active site residues may come from several subunits
Examples Snake venom vipoxin complex Human ephb2 receptor sam domain Heterodimer: Chain: a, phospholipase a2 inhibitor and chain b, phospholipase a2. Homohexameric
Complexity in quaternary structure The monomer of chaperonin GroEl (HSP60 CLASS) The tetradecamer
Other biopolymers • A nucleic acid polyphosphate chain has an even larger nombre of potential conformations, given that it contains 6 backbone torsion angles per monomer
PROGRAM • INTRODUCTION • PROTEIN STRUCTURE • PROTEIN CONFORMATION • FORCES THAT STABILIZE PROTEIN CONFORMATION • METHODOLOGICAL APPROACHES • APPLICATIONS
The conformational hyperspace -The potential energy landscape- • The conformation of a biopolymer is a function of a large number of degrees of freedom. • The surface described by the potential energy function in this n-dimensional space is very complex. • If the bond lengths and valence angles of a polypeptide chain are fixed • the chain contains 2 degrees of freedom per residue -the torsion angles f et y • this determines in a unique fashion the conformation of the molecule.
Rotational Isomeric State (RIS) Theory • For a given bond, the torsion angles may adopt a discret and finite number of states that correspond to the minimina of the potential energy function. • C = mn • C: number of conformations • m: number of rotational states • n: number of bonds • For m = 3 and n = 100 • C = 3100~1048
The native state is assumed to be the state of miniminum global energy. • DG of native state <----> denatured state may be very small • Even with the use of RIS theory, the polypeptide chain has a very large number of potential conformations. • =>Formation of a conformational hyperespace composed of a multitude of minima and maxima of the energy function, with many non native low energy conformations separated by (high) energy barriers. • A given energy-optimized geometry depends on the starting geometry. • For complex functions of several variables, there is no analytical solution for elucidation of the global minimum. • Even with numerical methods, the possibility exists of becoming trapped in local minima. • Furthermore, it is impossible to search and examine exhaustively all the accessible conformations. • Thus, the need to face and circumvent this problem => algorithms for the prediction of protein structure.
Prediction of protein side-chain conformations or rotamers • Important component of any modeling method (homology modeling, ab initio structure prediction) • Applications include study of mutations. The side chain torsion angles are named c1, c2, c3, etc. and the atoms b, g, d, etc.
Problem: side chains can adopt several conformations. Example: the arginine residue, 5 c angles. Example: the aspartate residue 2 c angles , 9 rotamers.
Side chain conformational search • Combinatorial problem • a complex search problem among interacting side chains in order to find a global minimum. • The minimum number of variables to consider are: • the number of rotamers for each side chain • the number of neighboring side chains interacting with each rotamer • The rotamers for the 20 amino acid residues are stored in databases (e.g. the SCWRL library, http://dunbrack.fccc.edu/SCWRL3.php). • The number of rotamers in a library depends on the c angle cutoff. • For a 40° c angle cutoff, a library contains 214 side-chain rotamers. • The energy function or score function may be based on the rotamer library and other terms, such as a repulsive steric energy.
PROGRAM • INTRODUCTION • PROTEIN STRUCTURE • PROTEIN CONFORMATION • FORCES THAT STABILIZE PROTEIN CONFORMATION • METHODOLOGICAL APPROACHES • APPLICATIONS
A number of interaction energies stabilize the structure of proteins and need to be taken into account in order to quantify the different types of energy that govern the behavior and the stability of a molecule: • Hydrogen bonds and the aqueous solvent • Hydrophobic effect • van der Waals interactions (steric) • Electrostatic interactions (ionic or salt bridges) • Covalent "cross-link" bonds, such as disulfide bonds
The hydrogen bond • An H atom is attracted by rather strong forces to 2 atoms instead of only one. • So, it may be considered to be acting as a bond between them. • Partially positively charged H atom lies between partially negatively charged O and N. • In water, H is covalently attached to the O (about 492 kJ mol-1). • But has an additional attraction (about 23.3 kJ mol-1 (almost 10x the average thermal fluctuation at 25°C) to a neighboring O of another water molecule. • That is far greater than the included van der Waals interaction (about 5.5 kJ mol-1). • The bond is part electrostatic (90%) and part covalent (10%). • The bond may be approximated by the following states • covalent HO-H····OH2 (major) • ionic HOd--Hd+····Od-H2 • covalent HO-····H-O+H2 (minor)
Holds the two strands of the DNA double helix together • Holds polypeptides together in secondary structures • Helps enzymes bind to their substrate • Helps antibodies bind to their antigen • Helps transcription factors bind to each other • Helps transcription factors bind to DNA • Numerous functions, one of which is that of Stockmayer • Ehb = 4 [ (s/r')12 - (s/r')6] - (µoµh/r3) g(∂o,∂h,ø) • r': distance between the hydrogen and the h-acceptor; s: coefficients independent of r‘; µ: dipole moments centered on the acceptor and the hydrogen; g: angular dependent function
The water solvent • A mediator in molecular interactions. • 0.25-0.45g associated for each gram of protein. • Polar molecule. • Essentially a proton donnor, e.g. side chains like Asp and Glu are strongly hydrated. • As proton acceptor, the O (electronegative) links to H atoms of neighboring molecules. • 40% of water h-bonds take place with the C=O group of the backbone and 44% with the side chains.