Using Crystal Structures

Using Crystal Structures Andy HowardBiology 5552 October 2018

The structure itself Coordinates Other parameters Modifications Mutations Ligands Drug Design Structure-based High-throughput Protein Engineering Industrial Medical Bioinformatics Database construction Classification schemes Structure prediction Combined techniques Single-xtal and CryoEM Single-xtal and XAFS Single-xtal and SAXS Agenda

The structure itself • What data are present in a completed structure? • Coordinates (X,Y,Z) • Debye-Waller (Temperature) factors (B) • Occupancies (Q)

Coordinates • The (X,Y,Z) coordinates of each atom are given in the PDB file: ATOM 13 CB TYR A 3 6.570 39.604 20.144 1.00 17.23 C • So for this atom, X=6.570, Y=39.604, Z=20.144Å from some pre-determined origin • Incidentally: that origin is fully defined by the spacegroup in some cases, and in other cases it is partially or fully arbitrary. • But they are real coordinates in Ångströms, so you can easily calculate interatomic distances.

Temperature Factors • Debye-Waller factor or temperature factor is a measure of how fuzzy an atom is. • It incorporates: • Thermal motion of atoms • Static disorder (i.e., an atom in one unit cell is not in the same position as it is in a different unit cell) • Errors in model-building

Concept of Debye-Waller • We have repeatedly noted that the (complex) structure factors are the Fourier transform of the electron density in the unit cell:Fhkl = V xyz(x,y,z)e-2i(hx+ky+lz) • The question is how do we model (x,y,z): • The atoms are not infinitesimally small • Atoms can move around a little relative to their equilibrium positions • We may not have accurately positioned them

How to model electron density for moving discrete atoms • For an atom centered at (x0,y0,z0), we approximate the electron density near that point as a spherical 3-dimensional Gaussian:(x,y,z) = (Nx0,y0,z0)/VL * exp{-B[(x-x0)2+(y-y0)2+(z-z0)2]/l4} • Where Nx0,y0,z0 is the number of electrons in the atom at that point, VL is an appropriate estimate of the atom’s volume, and B is a measure of how spread-out the electron distribution is.

So what determines this spread • As we said before, there are three sources: • True thermal motion • Static disorder • Errors in modeling • Therefore the overall B for an atom will be something likeBoverall = (B2thermal + B2static + B2error)1/2

Effect of B on electron density

How this plays out • In terms of the affect on the structure factors, we can approximate this effect asFhkl = VxyzN(x,y,z)e-Bs2e-2i(hx+ky+lz)where s = 2sin/ • So the mobility of atoms manifests itself in the structure factor as a term that depends on the sin/ value of the specific reflection (hkl). • The value B (in Å2) is found on the ATOM line of the PDB file to the left of the atom-identifier on the far right (17.23 in our example on slide 4)

Are they really spherical? • No. Often the distribution is more spread out in some directions than others • Helices are more flexible in the direction of the helix axis than perpendicular to it • Atoms in beta strands have asymmetries too • So rather than an isotropic B we model the mobility & flexibility as an ellipsoid • r(r) = r0exp(-(B∙r)2), B = symmetric tensor • PDB: Normalized B stored in ANISOU recordATOM 53 CA ARG A 4 -22.050 -1.202 -9.021 1.00 7.38 CANISOU 53 CA ARG A 4 1078 946 779 -78 -45 134 C

Occupancies, Q • These are measures of the fraction of unit cells in the crystal for which the atom in question is, in fact, present • These occupancy values can be computed • We expect that for most atoms within amino acids in a typical protein, Q = 1 • For solvent atoms, sometimes Q < 1 • For highly disordered regions of the polypeptide, it may make logical and computational sense to posit that Q < 1.

Example: agmatine in cholera toxin structure • Agmatine is decarboxyarginine (see above)or 4-aminobutylguanidine • We believe 2 agmantine molecules are bound in a cleft near the cholera toxin surface, near a asp that forms a salt-bridge with one of its positive charges • We would not expect the occupancy of either agmantine position to be 1; in fact, they might not even sum up to 1!

Ultra-high resolution structures:How many multiple conformers? • 2VB1 (0.65Å hen egg-white lysozyme):49 residues out of 129 have two conformers(why not more than two?) • 15 are nonpolar or somewhat nonpolar • Strings of multiple conformers:17-25, 85-90, 97-100,111-114 • Some are dubious (Q < 0.15):I88, T89, A90, W111

Occupancies of ligands or disordered protein atoms • A ligand or a disordered protein atom might be in its modeled position in only (say) 60% of the unit cells; we would therefore model it with an occupancy of 0.6 • Or the ligand or protein atom might take on two different conformations, both of which are observed in the electron density; then the two positions would have fractional occupancies that would sum to 1.

Occupancies of water molecules • Similarly, a water molecule might be only present in a fraction of the unit cells and would therefore have an occupancy < 1 • Or a water might occupy two different (probably nearby) positions such that every unit cell contains one of those waters or the other one, but not both; then each would have a fractional occupancy, and the two fractions would sum to 1.

Where do we go after we’ve solved a macromolecule’s structure? • Often there is an opportunity to modify the protein (or nucleic acid) and determine the structure of the modified macromolecule, thereby producing useful results with relatively modest effort.

Modifications that we can effect • In general we can start with the same crystallization recipe that we used in determining the original structure, and vary something: • (1) mutation that alters the protein sequence • (2) binding of ligand to the macromolecule: (a) by soaking (b) co-crystallizationLigand-binding is similar to heavy-atom soaks, except that it is more chemically predictable because the ligand is usually known to bind!

Crystallographic contributions to drug design • Crystallography plays a crucial role in many drug-design efforts. The steps required to use crystallographic research in developing new pharmaceutical products are outlined on the next few slides.

First step in drug design • (1) Identify the target macromolecule. • Till ~2005 this was usually an enzyme • Now: it’s more complex. • still mostly enzymes; but also… • Some RNAs (especially ribosomal RNA or siRNA) • G-protein Coupled Receptors(30-50% of all drug targets?)

Subsequent steps • (2) Determine structure of the unliganded protein • (3) Obtain 1-100 lead compounds • By examining the structure • By high-throughput screening (HTS) • By knowledge of the biological substrate • (4) Determine structures of (protein-lead compound) complexesby soaking or co-crystallization

Subsequent steps • (5) Examine the liganded structures:could the binding be improved? • (6) Modify the lead compound to improve its fit to the active site • (7) Determine structure of complex of protein with modified ligand • -repeat (5)-(7) until an adequate KD arises—with most drugs you want KD ~ 10 nM

What next? • Typically the optimized inhibitor (with the 10nM KD) will be an unsatisfactory drug, because it will be toxic or have poor bioavailability or both; so: • (8) Improve the optimized compound as a drug (lower toxicity, better bioavailability) even if that means a slightly higher KD. • (9) Spend $700M doing toxicology and efficacy testing on animals and in clinical trials on humans.

Drug-design time-line 100 -3 Improving affinity Toxicity and bioavailability Cost/yr, 106 $ Stage I clinical trials Stage II clinical trials log Ki -8 10 Research Clinical Trials

High-throughput drug design • This approach was originally developed using NMR rather than crystallography, but the Abbott Laboratories (now Abbvie) structure groups have used it on crystallography as well • Suppose you have 10-100 lead compounds, and you don’t know which will really bind most usefully to your target protein. • Then: you set up an experiment involving all of your lead compounds, organized in groups of 5-10 compounds…

Doing HT crystallography • Introduce the lead compounds ten at a time into pre-grown crystals of the target protein; that is, you only need 8 crystals to test 80 compounds, because you’ll be exposing each crystal to 10 different compounds • The best binder in each group (as determined by looking at the liganded structure) of 10 is the best candidate within that group. • So this is a competition assay in a crystallographic context!

Protein Engineering • This is a systematic, structure-influenced approach to making better proteins, either for industrial purposes or for medical applications (“biologics”) • Industrial compounds • Xylose isomerase (used to make high-fructose corn syrup from glucose) • Amylase (breaks down starch into glucose) • Bacterial serine proteases, used in laundry detergents and pre-soaks

How protein engineering works • The situation typically involves a protein that someone is already using industrially • The task is to improve its properties • Frequently the goal is improving the thermal stability of the protein, which often as a byproduct also improves its stability to degradation in other ways, e.g. exposure to bleach.

Bacterial serine proteasesused in laundry products • 1984: Commercially-used Bacillus amyloliquifaciens subtilisin had a ~ 1-month shelf life and was unstable in the presence of chlorine bleach. • Roughly 20 person-years of research yielded a protein that had essentially an infinite shelf-life; ~1000-fold increase in shelf-life, employing only 6 amino acid changes from the original

How to improve thermal stability • Introduce disulfides (enthalpic) • Eliminate internal water molecules by replacing small amino acids with larger ones, effectively replacing water with carbon atoms • Identify floppy loops and stabilizing them in some logical way • Special cases like the next slide …

The beta-bulge in subtilisin • One of two neighboring beta strands might contain a segment that bulges out from the beta strand; we describe that region as a beta bulge • One such bulge occurs in subtilisin • There’s a hydrogen bond from an asn side-chain on the bulging strand to a main-chain atom in the other strand; this substitutes for the main-chain hydrogen bond that “should” have been present in the beta strand interactions. N

Beta bulge modification • Asn to ser: side-chain to main-chain H-bond is still there, but it’s slightly stronger because serine is a smaller amino acid. • Ser enables creation of a 2.9Å H-bond rather than the original 3.1Å H-bond • That improves the H by ~ 0.4 kJ mol-1 • That results in a ~4º improvement in stability!

How this helps • We might not always seek greater thermal stability, but when we do, these approaches can be informative of how to do it • Altering substrate specificity or introducing an allosteric site can be approached in similar ways • So the task of engineering a protein is exemplified by these approaches.

Bioinformatics • Not everything in bioinformatics is about sequences. We’d like to know about 3D structures as well when we explore organisms’ relatedness or look for function • Clearly structure determinations can facilitate that • BLAST searches can be based on known structures only

Structure prediction • Ideal of using only primary sequence of a protein to determine its 3D structure has yet to be realized • Ab initio structure prediction: • Often works with mostly-alpha-helical proteins • Rarely successful with non-helical proteins • But fragment-based predictions, where we thread the fragment structures through the sequence, often works

Using Crystal Structures