Direct Methods and Many Site Se-Met MAD Problems using BnP

Direct Methods and Many Site Se-Met MAD Problems using BnP

Classical Direct Methods • Main method for “small molecule” structure determination • Highly automated (almost totally “black box”) • Solves structures containing up to a few hundred non-hydrogen atoms in the asymmetric unit.

Direct Methods Assumptions and Requirements • Non-negativity of electron density • Atoms are “resolved”, i.e. “atomic resolution” data are available • Unit cell, symmetry and contents are known

Important Concepts - 1 • Normalized Structure Factors EHgiven by EH = FH / < |FH|2>1/2 with averaging in resolution shells • The phase fH of EH is the same as for FH • < |EH|2> = 1 hence “normalized”

Important Concepts - 2 • Structure Invariant - structural quantity independent of choice of unit cell origin • Probabilistic estimates can be made for the values of structure invariants given the associated E magnitudes and cell contents

Linear combinations of phases whose Miller indices sum to zero are structure invariants • ExampleyHK = fH + fK + f-H-K = structure invariant, i.e. yHK = f1,2,1 + f2,-1,3 + f-3,-1,-4yHK referred to as triple, triplet, three-phase invariant, invariant, tpr, sigma2 relationship etc

Fundamental formulas involving individual triplets • P(yHK) = [2p I0(AHK)]-1 exp(AHKcos yHK) where P(yHK) is the probability of thestructureinvariant having the valueyHK • AHK = 2 |EHEKE-H-K| / N1/2where N is the number of atoms in the cell and the E’s are normalized structure factors

Note probability P(yHK) increases as AHK increases, and that AHK is proportional to product of E’s and inversely proportional to N1/2 • Expectedvalue of cos yHKis given by <cos yHK> = I1(AHK) / I0(AHK)

F3 = YHK, K=AHK Cochran Distribution for various K’s s vs K

Most probable value of yHK is always zero, so yHK = fH + fK + f-H-Kbecomes 0= fH + fK + f-H-Kand fH =- fK - f-H-Ki.e. f1,2,1 =- f2,-1,3 - f-3,-1,-4 • There are many more triplets than structure factors, so the phases are highly over determined (Lysozyme at 3.0Å, has 2186 reflections and 3,636,804 triplets, i.e. 1663:1)

Fundamental formula involving multiple triplets • Tangent formula -S|EKE-H-K| sin (fK+f-H-K) tan (fH) = ____________________________S|EKE-H-K| cos (fK+f-H-K)

Fundamental formula involving multiple triplets • Minimum function R(y) = SAHK[cos (yHK) - I1(AHK) / I0(AHK) ]2 __________________________________ SAHK

Classical Direct Methods Applications for Proteins • Used for phase extension to very high resolution • Used with moderate success to locate heavy atom sites in isomorphous derivatives • E values used in molecular replacement calculations

Current Direct Methods Applications for Proteins • Shake n Bake (based on minimum function) used to solve complete protein structures with over 1,000 atoms (rubredoxin, lysozyme, calmodulin etc.), provided data to 1.1Å or better is available • Used to locate anomalous scatterer sites from MAD or SAS data

General Shake n Bake Concept • Use a multi-solution method starting with random phases (or randomly positioned atoms). • For each trial phase set, use a “dual space” procedure iterating between real and reciprocal space optimization/constraints.

Reciprocal space optimization based on shifting phases to reduce the “minimum function” R(y) • Real space optimization and constraints based on computing new phases only from the largest peaks in map based on previous cycle phases • Each trial phase set ranked by value of R(y)

SnB inner loop for trial structure Generate random trial structure Stop after N iterations Compute phases from structure Select “structure” from largest peaks Shiftphases to reduceR(y) Compute map from new phases

Application to pyruvate dehydrogenase multi-enzyme complex E1 component • MW 100 Kda (monomer) • a= 81.69, b= 141.6, c= 82.46Å, b=102.4° • Space group P21 • Asymmetric unit = dimer, 1774 residues 42 methionines • MAD data (3l) on selenomethionine analog to 2.3Å, used 3.5Å data for Se determination

Choice of data for Se determination • Use | |FH|+ - |FH|-| (anomalous) difference at single l • Use | |FH|li - |FHllj | (dispersive) difference between two l’s • Use FA values (derived from data at all l’s) • Use FHLE values based on max anomalous and max dispersive differences

SelMet-Met Scattering Power fo f’ f” Se 1-2 0.0 - 2.17 3.15 1-3 0.0 -7.63 3.15 2-3 0.0 -5.46 5.81 3 - - 3.37 Se-S CuK18.0 -1.20 1.14 l1= inflection point, l2= peak, l3= high energy remote

Projection of peaks down NC twofold

Computing Phases • Phases computed by multiplying individual SIR and/or SAS probability distributions using A,B,C,D representation based on intensities. • “Standard” E values updated by averaging lack of closure over all reflections, with each reflection’s contribution itself a probability weighted average over all possible protein phases.

MAD Phasing • For data collected at 1, 2 etc, choose a wavelength n as “native” data, and “reduce” that data set by averaging Bijvoet pairs. • For other “derivative” wavelengths d, reduce both by averaging Bijvoet pairs to form “isomorphous” data sets, and without averaging to form “anomalous” data sets.

MAD Phasing • For “isomorphous” and “derivative anomalous” data sets, scale “derivative” to “native” and use scattering factors of f0= 0, f’= f’(d) - f’(n), f”= f”(d) • For “native anomalous” data use original native Bijvoet pairs and scattering factors of f0= 0, f’ = 0, f”= f”(n)

Phase Refinement Options   2 W P | FPHobs |  | FPHcalc (  ) |   h  h P h P  h P • “Classical” - P = centroid, Wh=1/E2,1/ <E2> or unity,PP=1, use reflections with FOM > 0.4-0.6 • “Maximum Likelihood” -Pstepped over allowed phases, PP= corresponding probability, Wh=1/E2, 1/ <E2> or unity, use reflections with FOM > 0.2 P, PPcan also come from external source, i.e solvent flattened or NC-symmetry averaged maps.

MAD 1, 2, 3 data (Scalepack files) final map FSFOUR EXTRMP CMBISO CMBANO “submap” file “iso” and “ano” scaled files MAPAVG all “native” (3) data PHASIT “averaging” mask file “phase” file BNDRY BLDCEL MISSNG “extension” file MAPINV

MAD Phasing/Averaging Statistics

Peak anomalous (l2) difference Patterson

SelMet-Met Scattering Power fo f’ f” Se 1-2 0.0 - 2.17 3.15 1-3 0.0 -7.63 3.15 2-3 0.0 -5.46 5.81 3 - - 3.37 Se-S CuK18.0 -1.20 1.14 l1= inflection point, l2= peak, l3= high energy remote

With SnB it’s possible to automatically locate the anomalous scatterer substructure with data from any one of the dispersive combinations or anomalous pair sets • As expected, sets with the maximum dispersive or anomalous signal typically yield a greater frequency of success

Automated Applications of BnP: Methodology W. Furey,1 L. Pasupulati,1 S. Potter2, H. Xu2, R. Miller3 & C. Weeks2 1University of Pittsburgh School of Medicine and VA Medical Center 2Hauptman-Woodward Medical Research Institute 3Center for Computational Research, SUNY at Buffalo

SnB Strengths 1. Powerful, state-of-the-art direct methods for automatically locating heavy atom sites 2. Friendly graphical user interface. SnB Weaknesses 1. Stops after finding sites, i.e no protein phasing 2. No software interface PHASES Strengths 1. Proven protein phasing (MAD, MIRAS, etc), solvent flattening, NCS averaging, external program interfacing 2. Interactive graphics PHASES Weaknesses 1. Doesn’t automatically find heavy atom sites 2. Script based, i.e. no GUI Goal:Provide user-friendly software for automatic determination of protein crystal structures

Adopted Strategy • Combine the SnB program with the “PHASES” package, putting everything under GUI control • Establish default parameters and procedures allowing all aspects of the structure determination to be fully automated • Also provide a manual mode allowing experienced users more control, and to facilitate development • Provide graphical feedback when possible • Facilitate coupling with popular external software

Main Developments Required for Automated Structure Determination • Automatic substructure solution detection • Automatic substructure validation • Automatic hand determination (including space group changes, when needed)

Automatic Substructure Solution Detection Original Method Based on histogram (Manual, time consuming, requires user interaction) Current Method Based on Rmin and Rcryst statistics (Automatic, fast, no user interaction)

Automatic Substructure Validation Original Method Left up to user to decide which peaks correspond to true sites (Manual) Current Method (auto mode) Based on occupancy refinement against Bijvoet differences (Automatic, fast, requires no coordinate refinement, hand insensitive) Current Method (manual mode) As in auto but can also compare peaks from different solutions (Manual)

Automatic Substructure Validation

Automatic Hand Determination Original Method Visual inspection of map projections (Manual, requires user interaction) Current Method (MAD, SIRAS or MIRAS) Based on variance differences in protein and solvent regions (Automatic, fast since requires no refinement, also requires no user interaction)

Automatic Hand Determination Current Method (SAS data only) Comparative analysis of R, FOM and CC after solvent flattening/phase combination. (Automatic, fast, requires no refinement) Current Method (SIR, MIR data only) Both hands tried, map examination needed. (Requires user interaction)

Direct Methods and Many Site Se-Met MAD Problems using BnP

Direct Methods and Many Site Se-Met MAD Problems using BnP

Presentation Transcript

Direct and Inverse bioelectric problems

Moving Average, MAD, Tracking Signal Problems

Many Methods, One Goal

Many Members – Many Methods – One Message

Direct Methods

Mini Landfill ~ Many Problems

It’s a Mad, Mad, Mad, Mad World

Direct Variation Word Problems

Scalable many-light methods

Many Problems are Hard

XII. Site Specific Predictions Using Ray Methods

THE MAD, MAD, MAD, MAD TREASURE HUNT

Simulation of direct space charge in Booster by using MAD program

IT’S A MAD, MAD, MAD, MAD, WORLD

Direct Monitoring Methods

Matrix Solvers Direct Methods

Direct Methods

Many Methods, One Goal

IT’S A MAD, MAD, MAD, MAD, WORLD

Many Problems are Hard