320 likes | 513 Views
3D reconstruction of macromolecular complexes by electron microscopy. Sjors H.W. Scheres. Macromolecular machines. 15 m. 15 10 -9 m. F1-ATPase: Abrahams et al., 1994. Dutch windmill. Studying these machines. The different states tell much about the way these machines work!
E N D
3D reconstruction of macromolecular complexes by electron microscopy Sjors H.W. Scheres
Macromolecular machines 15 m 15 10-9 m F1-ATPase: Abrahams et al., 1994 Dutch windmill
Studying these machines • The different states tell much about the way these machines work! • Different conformations of (chemically identical) molecules are very hard to purify • Biophysical techniques that study the bulk, “average-out” information about these conformations
The promise of 3D-EM • In 3D Electron Microscopy individual molecules are visualized • Trapped in ice, these molecules are free to adapt many conformations j j
The specimen • Biological samples: light atoms, low-density, high water content • Low contrast, low signal-to-noise ratios • Sample needs to be protected: • Negative staining (dry sample in matrix of e.g. Uranyl acetate)+: high-contrast, not sensitive to radiation damage- : only shape info, artefacts (!), low-resolutions due to grain size • Cryo-protection (flash-freezing hydrated specimens in liquid ethane)+: “natural environment”, high-resolutions- : low contrast, more sensitive to radiation damage
Experimental data Cryo-EM Negative staining
Single-particle 3D-reconstruction • Combination of multiple 2D-projections from different directions into a 3D-structure • Use many (100-100,000) copies of the same specimen (i.e. structurally identical copies) • Record one projection per copy Ribosome: Valle et al., 2003 RSC+nucleosome: Asturias et al., 2004 Bluetongue Virus Nason et al., 2004
Inconveniences in 3D-EM • The experimental signal-to-noise ratio is ~1/10 • We collect 2D-images, while often we want to know about our molecules in 3D • The molecules adopt unknown orientations on the experimental support • The molecules may adopt distinct conformations
aveage over 2,000 copies Quite a problem • Fight the noise by averaging • BUT, this requires: • alignment: determine the unknown orientations • classification: separate distinct conformations
Classification alignment & classification are strongly intertwined! noise forms a serious problem!
Our approach • Combine alignment & classification in a single optimization process • Statistically model the experimental noise • Maximum-likelihood alignment and classification
Maximum Likelihood • Optimize the probability of observing the data, given the model (i.e. the likelihood) • These probabilities are based on: • a statistical model of the experimental noise • a statistical model of the orientations & classes • Mathematics: optimizing the likelihood yields models with less bias than any other method (Rice, 1995)
Two cases • Alignment & classification in 2D: • align images and calculate 2D averages for the distinct classes • xmipp_MLalign2D • Alignment & classification in 3D • align images and calculate 3D reconstructions for the distinct classes • xmipp_MLrefine3D
for each pixel j: ( ) (Xj – Aj)2 exp -2s2 s Aj Xj White noise = independence between pixels! P(Xj|Aj)∞ P(data image|model image) ∞ P P(Xj|Aj) j Statistical model data: X model: A
The model comprises: • estimates for the underlying objects • estimate for the amount of noise (s) • statistical distributions of k & orient. Log-likelihood function • Adjust model to maximize the log-likelihood of observing the entire dataset: Optimization algorithm: Expectation Maximization
sampled rotations 360° for each image, calculate all calculate new 2D average as probability weighted averages The 2D algorithm estimates for K 2D objects k=1 k=2
xmipp_MLalign2D • Scheres et al., J. Mol. Biol. (2005)
project into all (discretely sampled) orientations for each image, calculate all calculate new 3D estimates as probability weighted 3D reconstructions The 3D algorithm estimates for K 3D objects k=1 k=2
Prelim. ribosome reconstruction 91,114 particles; 9.9 Å resolution fragmented (depicted at a lower threshold) blurred
Seed generation 80 Å filter 4 random subsets; 1 iter ML
6 CPU-months ML3D-classification • 4 references • 91,114 particles • 64x64 pix (6.2Å/pix) • 25 iterations • 10° angular sampling
no ratcheting; no EF-G; 3 tRNAs differences: overall rotations ratcheting, EF-G, 1 tRNA ML-derived classes Scheres et al., Nat. Methods (2007)
So, it works! • Applicability may be very general • Structural variability is part of Nature! • xmipp_MLalign2D has been used in numerous studies already • xmipp_MLrefine3D has been demonstrated to work on 4 cases
But it is expensive! • A typical 2D run: • 10,000 images, 10 references: ~CPU weeks • A typical 3D run: • 50,000 images, 4 references: ~CPU months
Parallelization issues Within each iteration individual experimental images can be processed independently!
Current parallel implementations • MPI (message passing interface) • standard for many clusters • Shell scripts • may be run on CPUs of different architectures • EGEE grid • tomorrow’s test application
My data at home(User Interface) Grid Interface(Resource broker,DIANE, etc.) Grid Interface(Resource broker,DIANE, etc.) Grid working nodes At home: combineresults from all nodes EGEE grid ITERATE!
Tomorrow’s case: SV40 large T-antigen • The machine that starts replication of a virus (SV40) that causes cancer… • 1,670 projections of a complex with an asymmetric DNA-probe • To limit computation, you will use 2 references,to reproduce:
Ribosome Haixiao Gao Joachim Frank LTA Mikel Valle Mathematics Gabor T. Herman Paul P.B. Eggermont Our group José-María Carazo Acknowledgements • Computing • EGEE grid • Barcelona Supercomputing Center • Dept. Computer Architecture & Electronics, University of Almeria