140 likes | 177 Views
Viral infections remain a major health threat, and understanding virus structures is crucial for developing prevention and treatment strategies. This project focuses on the 3-D reconstruction of viruses like Epsilon15 at high resolution, revealing details such as the portal complex and tail structure. Cryo-EM image processing involves intensive computing for 2D alignment and 3D reconstruction, enabling visualization of virus components. Resource support from RCAC at Purdue University aids in generating high-resolution results. Performance analysis shows varying job durations on different platforms, highlighting considerations for efficient job scheduling and resource allocation.
E N D
Condor in Cryo-EM image processing Weimin Wu, Wen Jiang Department of biological sciences Purdue University 04/30/2008
Cryo-EM: low temperature electron microscopy Image processing: get the 3D reconstruction from 2D images. Introduction: Viral infections have been and remain one of the major threats to human health. Viruses are large assemblies of proteins and nucleic acids that rely on infection of hosts to complete their life cycle and sustain their propagation. High resolution 3-D structure of the virus particles will provide important insights to understanding of these processes and the development of effective prevention and treatment strategies. Recently we have demonstrated, in collaboration with researchers in Baylor College of Medicine and MIT, the 3-D reconstruction of the infectious bacterial virus Epsilon15 (ε15) at 4.5 Å resolution, which allowed tracing of the polypeptide backbone of its major capsid protein gp7 (Jiang et al.,Nature451(7182):1130-4, 2008).
For many of the tailed dsDNA viruses, for example the bacterial viruses T7, T3 and ε15, one of the 12 icosahedral 5-fold vertices is occupied by a unique 12-fold portal protein complex. This unique portal vertex is responsible for the packaging of dsDNA genome into the protein shell during assembly and the ejection of the dsDNA genome out of the virus and into the host cell during infection. However, high resolution structure of these virus particles, especially the non-icosahedrally organized components such as the portal complex, the tail and the encapsulated dsDNA genome, are lacking. I am working on this kind of project without enforcing any symmetry on virus. Now we get a sub-nanometre resolution result which enables us to visualize the secondary structure of portal, tail hub and tail spikes.
(A) Schematic diagram of the T7/T3 phage particle assembly and dsDNA genome packaging pathway. Adapted from (Serwer, 2004). (B) A cryo-EM micrograph of T3 phage showing the particles representing each of the major stages during assembly and genome packaging.
Tail hub spike portal terminus core DNA rings Image processing is a critical step for generating the macromolecule 3D structure from the 2D images taken with cryo-EM technique. This step includes 2D alignment and 3D reconstruction. Both need intensive computing power. High performance computing (HPC) resources supported by RCAC enable us to work on huge datasets for getting high resolution results and therefore learning more details of biological system.
2D images Projections 55,000 vs 1,400 Scientific needs: Two major steps are involved in the cryo-EM image processing. One is the 2D alignment step, which is to find the orientation and center information of the sample particles by matching the images (2D projection of the sample particles) with the reference, the other step is 3D reconstruction step, which generates the 3D map by collecting all the particles’ orientation and center information and averaging them. 1second 1 raw image vs 1 projection 22K CPU hours
GroEL as example to show the 3D reconstruction and many iterations needed for high resolution. For our E15 project, even we started with an intermediate resolution map (7Ǻ), more than 10 iterations were continued for achieving 4.5Ǻ. Features as a function of resolution to show how to evaluate the resolution qualitatively from density map
Condor Performance: We feel lucky in Purdue to get so many resources supported by RCAC, otherwise our research will take forever. Here I list the condor jobs we submitted and CPU hours we used. *each job took about half a hour. *each job took about one hour due to different algorithm and other reasons.
Running jobs versus Time. This is a long time job, about 64hours. It is obvious there are three major peaks. These three periods are overnight time. At daytime, the number of running jobs drop a lot due to owner use. The three peaks are getting smaller mean the user priority is getting lower. Now it is summer holiday, I can get more than 3,000 nodes for my condor jobs.
We tried to use all the platforms to run our condor jobs. How about the performance of different platforms? The LINUX 64-bit machines are not as fast as we expected. Why?
We checked the remote host condor jobs submitted to in this test, 90% of LINUX 64-bit machines were from ccl00.cse.nd.edu. The condor jobs could go to the nodes out of campus and the performance was just slightly worse. It made us more confident to seriously think about the Teragrid, although we have tried Teragrid but still used the resources in campus. Anyway it is a problem when the files to be transferred are large, for example, more than 700M.
High quality Alpha-helix ,Beta sheet and Side chain, which enabled us to do the modeling and get the backbone structure. With icosahedral symmetry
Our problem/concern about Condor: • Operation: the best thing for us is to submit the condor jobs from our desktop, and let condor itself to find resources, but now we need specify where to go if using Teragrid. • File transfer: in the case of large file transfer, the network becomes bottleneck which will easily overload the head node and crash it, especially when the file goes outside of campus. This is due to large amount of reading from the only copy of large dataset. However this might be circumvented by applying P2P client into the condor because in our image processing 2D alignment step, one image will be compared to all the reference projections, those projections might have been sent to neighboring computers to run another condor job, therefore for this condor job, the file could be transferred from neighboring nodes. Based on this, the number of reading from original copy will drop a lot, in theory, might be just a few times. The file transfer speed will also increase dramatically.
Acknowledgment: Preston Smith David Braun Steve Wilson Pia Mikeal Bruce L. Fuller • Reference: • Jiang et.al Vol439|2 February 2006/Nature 04487 • Jiang et.al Vol451|28 February 2008/Nature 06665