120 likes | 349 Views
XRD data analysis software development. Outline. Background Reasons for change Conversion challenges Status. X-Ray Diffraction (XRD). What is XRD experiment for? provides information on the relative positions of atoms in a crystal
E N D
Outline • Background • Reasons for change • Conversion challenges • Status
X-Ray Diffraction (XRD) • What is XRD experiment for? • provides information on the relative positions of atoms in a crystal • allows individual crystalline structures to be identified • detects stains in the crystals as well • What is XRD data? • They are digital images collected by CCD Camera when synchrotron X-Ray beamline scanning on a sample area • Image data sizes are large • One 2D image collected for each scan point • 8MB/image at CLS, 2084X2084 pixels/image • Hundreds or even thousands of images could be collected in an experiment, depending on • the size of the sample area to be scanned • the step size in moving the sample during the scan • more scan points provide more detailed information for the analysis
XRD data analysis • Deals with large amount of image data • Several procedures to each of the images • Peak searching, identify regions of interest • including threshold finding, blob searching and 2D curve fitting on each blob • Indexing, identify possible/known crystalline structures • Strain analysis, detect stains in material (?) • Existing XRD data analysis software was written in IDL • A proprietary scripting language • Only carry out processes sequentially • It is very time consuming! • e.g. normally, days are needed to complete the processing of a whole package of data
Reasons for change • Needed for incorporation into Science Studio • Aim of SS to provide remote users feedback during experimental runs; XRD analysis is one • Existing code in scripting language and relied on sequential processes • Existing software is written in IDL • Peak searching is in IDL • Indexing and strain analysis are in IDL calling externals in Fortran • Needed to have versions for Streaming data analysis • Stream processing -- taking a steam of input data, processing the data in a series of steps, steaming the results out, achieving real time or close to real time performance • Needed to solve data storage problem • Accumulating large amount raw image data for a long time could cause storage problem • Actually, only those peaks in each image are the useful information for the analysis • If those peaks can be found during data collection in real time, it might not be necessary to keep the raw images • E.g. a typical raw image size is 8MB at CLS, while the peak data for the image is only about 10KB
How to make the change • Our development target • To port existing software for XRD data analysis to a Cell system at SHARCNET to achieve stream processing for XRD data analysis • SHARCNET’s Cell system • Including 8 Cell blades (QS22) -- 2 Cell processor chips on each Cell blade, i.e. total 16 Cell processors • Cell processor -- a heterogeneous multi-core architecture • Two types of cores optimized for different tasks • 1 Power Processing Element (PPE) and 8 Synergistic Processing Elements(SPE) • PPE -- Power PC architecture, acts as a controller to perform control-intensive tasks • SPEs -- simpler cores devote more resources, perform computation intensive tasks • Cell processor can be programmed to achieve streaming processing
Basic Cell Programming Model Orientation Strain XRD data analysis procedures Resultant Maps Diffraction pattern
Challenges • Cell only runs Linux and compiled code in C/C++ • PPE and SPE execute different instruction sets • Compile code for PPE and SPE use different compiler • Existing software is written in IDL • Peak searching is in IDL • Indexing and strain analysis are in IDL calling externals in Fortran • Challenges • No algorithm provided: rewrite code in C using only the source code in IDL • Programming on Cell is new and challenge because of Cell’s special architecture • Need knowledge of programming at assembly level • Limited function libraries available for Cell’s SPE
Development plan • Rewrite code in C • Validate the results produced by the C code • Comparing with results from existing software • Make the code run on Cell’s PPE • Design for parallel processing on Cell • Identify strategy for parallel computation • Identify what should be executed on Cell’s SPEs • Implement the design • Validate the results produced by Cell • Performance measurement
Progress Report • Peak searching and Indexing procedures have been rewritten in C • Results produced by the C code for both procedures have been validated • at least with our limited data set • Peak searching has been ported on Cell successfully • Threshold finding and blob searching are carried out by PPE • 2D Curve fitting (Lorentz fitting) for each blob is carried out by SPUs • Typical number of blobs found on each image is about 100 ~200 depending on the threshold setting • Some preliminary performance measurements have been done on Cell system for peak searching procedure
Some preliminary performance measurement (2)Peak searching on CLS XRD data:8MB/image, 2084X2084 pixels/image, Desktop speed: 9.34 sec./image
More work to do .. • Continue rewrite code in C for strain analysis on XRD data • Port indexing and strain analysis procedures onto Cell • Design programming model for Cell to achieve streaming processing for all procedures in XRD data analysis • Implement the design • Integrate the streaming processing on XRD data with Science Studio