350 likes | 564 Views
Toward Parallel Space Radiation Analysis. Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan Strausser, Travis Gilbert, Victor Shum, Romeo Chua University of Houston Clear Lake.
E N D
Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan Strausser, Travis Gilbert, Victor Shum, RomeoChua University of Houston Clear Lake shih@UHCL.edu
This project continues Space Radiation Research work preformed last year by Dr. Liwen Shih’s students to investigate HZETRN code optimization options. This semester we will analyze HZETRN code using standard static analysis tools and runtime analysis tools. In addition we will examine code parallelization options for the most called numerical method in the source code: the PHI function. shih@UHCL.edu
What is Space Radiation? Two major sources • galactic cosmic rays (GCR) • solar energetic particles (SEP). GCR are ever-present and more energetic, thus they are able to penetrate much thicker materials than SEP. In order to evaluate the space radiation risk and design the spacecraft and habitat for better radiation protection, space radiation transport codes, which depends on the input physics of nuclear interactions, have been developed shih@UHCL.edu
Space Radiation and the Earth This image shows how the Earth's magnetic field causes electrons to drift one way about the Earth. Protons drift the opposite direction. original clips provided courtesy of Professor Patricia Reiff, Rice University, Connections Program Earth protected from Space Radiation Animation Sources: Rice University, Connections Program. shih@UHCL.edu
What about Galactic Cosmic Radiation (GCR)? A typical high energy particle of radiation found in the space environment is ionized itself and as it passes through material such as human tissue it disrupts the electronic clouds of the constituent molecules and leaves a path of ionization in its wake. These particles are either singly charged protons or more highly charged nuclei called "HZE" particles. shih@UHCL.edu
HZETRN - Space Radiation Nuclear Transport Code The three included source code files are: 1-NUCFRAG.FOR for generating nuclear absorption and reaction cross sections 2-GEOMAG.FOR for defining the GCR transmission coefficient cutoff effects within the magnetosphere. 3-HZETRN.FOR for propagating the user defined GCR environments through two layers of user supplied materials. The current version is setup to propagate through aluminum, tissue (H2O), CH2 and LH2. HZETRN : High Charge and Energy Nuclear Transport Code FORTRAN-77 Written: 1992 Environment: VAX mainframe Code Metrics: Files: 3 Lines: 9665 Code Lines: 6803 Comment Lines: 2859 Declarative Statements: 780 Executable Statements: 6563 Ratio Comment/Code: 0.42 shih@UHCL.edu
HZETRN Numerical Method shih@UHCL.edu
HZETRN Calculates: • Radiation Fluence of HZE particles: time-integrated flux of HZE particles per unit area. • Energy absorbed per gram: first measuring energy amount left behind by radiation in question and, then, amount and type of material. • Dose Equivalent: A unit of dose equivalent amount of any type of radiation absorbed in a biological tissue as a standardized value shih@UHCL.edu
HZETRN Algorithm shih@UHCL.edu
HZETRN used for Mars Mission NASA has a new vision for space exploration in the 21st Century encompassing a broad range of human and robotic missions including missions to Moon, Mars and beyond. As a result, there is a focus on long duration space missions. NASA, as much as ever, is committed to the safety of the missions and the crew. Exposure from the hazards of severe space radiation in deep space long duration missions is ‘the show stopper.’ Thus, protection from the hazards of severe space radiation is of paramount importance for the new vision. There is an overwhelming emphasis on the reliability issues for the mission and the habitat. Accurate risk assessments critically depend on the accuracy of the input information about the interaction of ions with materials, electronics and tissues. shih@UHCL.edu
Martian Radiation Climate Modeling Using HZETRN Code • Calculations of the skin dose equivalent for astronauts on the surface of Mars near solar minimum. • The variation in the dose with respect to altitude is shown. • Higher altitudes (such as Olympus Mons) offer less shielding. Mars Radiation Environment (Source Wilson et al: http://marie.jsc.nasa.gov) shih@UHCL.edu
HZETRN Model vs. Actual Mars Radiation Climate HZETRN underestimates! Dose rate measured by MARIE spacecraft in the transit period from April 2001 to August 2001 compared with HZETRN Calculated Doses Code calculations Spike in May due to SPE Differences between the observed (red) and predicted (black) doses vary from factor 1 to 3 Partly Because of Code Inefficiency Dosage Data is underestimated Graph Source: Aliena Spazio European Space Agency Report 2004 shih@UHCL.edu
Project Goal: Speedupof Runtime via Analysis and modification of HZETRN Code numerical algorithm PHI Interpolation Function The major Space Radiation Code Bottleneck lies inside the function call to the PHI interpolation function shih@UHCL.edu
Code Optimization Options 4028 C ************************************************************** 4029 C 4030 FUNCTION PHI(R0,N,R,P,X) 4031 C 4032 C FUNCTION PHI INTERPOLATES IN P(N) ARRAY DEFINED OVER R(N) ARRAY 4033 C ASSUMES P IS LIKE A POWER OF R OVER SUBINTERVALS 4034 C 4035 DIMENSION R(N),P(N) 4036 C 4037 SAVE 4038 C 4039 XT=X 4040 PHI=P(1) 4041 INC=((R(2)-R(1))/ABS(R(2)-R(1)))*1.01 4042 IF(X.LE.R(1).AND.R(1).LT.R(2))RETURN 4043 C 4044 DO 1 I=3,N-1 4045 IL=I 4046 IF(XT*INC.LT.R(I)*INC)GO TO 2 4047 1 CONTINUE 4048 C 4049 IL=N-1 4050 2 CONTINUE 4051 PHI=0. • Fix Inefficient code • Fix/Remove unnecessary function calls (TEXP) SAVE, and dummy arguments • Use optimized ALOG function • Use Lookup Table instead • Investigate Parallelization Of Interpolation Statements Link to HZETRN shih@UHCL.edu
Code Optimization Improve Code Structure USE FASTER ALOG function (LOG) Remove extraneous Function Calls shih@UHCL.edu
Steps toward a faster HZETRN shih@UHCL.edu
Parallel Space Radiation Analysis • The goal of project was to speed up the execution of the HZETRN code using parallel processing. • The Message Passing Interface (MPI) standard library was to be used to perform the parallel processing across a cluster with distributed memory. shih@UHCL.edu
Computing Resources Used • Itanium 2 cluster (Atlantis) - Texas Learning & Computation Center (TLC2) at the University of Houston. • Atlantis is a cluster of 152 dual Itanium2 (1.3 GHz) compute nodes networked via a Myrinet 2000 interconnect. Atlantis is running RedHat Linux version 5.1. • The Intel Fortran compiler (version 10.0) and OpenMPI (an Open Source MPI-2 implementation) of MPI is being used. • In addition, a home PC running Linux (Ubuntu 7.10) with the Sun Studio 12 Fortran 90 compiler and MPICH2 was used. • TeraGrid has just started been used shih@UHCL.edu
PHI Routine (Lagrangian Interploation) • Figure showing HZETRN runtime profile • Most time is spent by function PHI - 3rd order Lagrangian Interpolation. • PHI function is heavily called by the propagation and integration routines -called 229,380 times at each depth typically. • Early focus - optimizing PHI routine. • The PHI routine takes the natural log of the input ordinate and abscissas prior to peforming the Lagrangian interpolation and returns the exponential of the interpolated ordinate. (Source: Shih, Larrondo, et al, HIgh-Performance Martian Space Radiation Mapping, NASA/UHCL/UH-ISSO, pp. 121-122) • Removing the calls to the natural log and exponential functions resulted in a 21% (Atlantis) to 45% (home) speedup, but had negative impact on numerical results (see next page) since the the functions being interpolated are logarithmic. shih@UHCL.edu
PHI Routine - Needs LOG/TEXP Significant different comparing results with and without calls to LOG/TEXP shih@UHCL.edu
PHI Routine Optimization • Bottleneck PHI routine being called so heavily, message passing overhead to parallelize would be prohibitive. • Simple code optimizations of PHI routine resulted in: • 11.4 % speedup on home PC running Linux compiled using the Sun Studio 12 Fortran compiler. • 3.85% speedup on an Atlantis node using the Intel Fortran compiler. • Reduced speedup on Atlantis may be that the Intel compiler was already generating more optimized code. shih@UHCL.edu
Implementing bottleneck routines: PHI routine, and/or logarithm/exponential routines in an FPGAcould result in a significant speedup. A reduced precision floating-point FPGA prototype was developed for an estimated ~325 times faster PHI computation in hardware. PHI Routine FPGA Prototype shih@UHCL.edu
HZETRN Main Program Flow Basic flow of HZETRN: • Step 1: Call MATTER to obtain the material property (density, atomic weight and atomic number of each element) of the shield. • Step 2: Generate the energy grid. • Step 3: Dosemetry and propagation in the shield material • Call DMETRIC to compute dosemetic quantities at current depth. • Call PRPGT to propagate the GCR's to the next depth • Repeat step 3 until target material is reached • Step 4: Dosemetry and propagation in the target material • Call DMETRIC to compute dosemetric quantities at current depth. • Call PRPGT to propagate the GCR's to the next depth • Repeat step 4 until required depth is reached. shih@UHCL.edu
DMETRIC Routine • The suboutine DMETRIC is called by the main program at each user specified depth in the shield and target to compute dosimetric quantities. • Their are 6 main do-loops in the routine. Approximately 60% of DMETRICs processing time is spent in loop 2 and 39% of DMETRICs processing time is spent in loop 5. • To check whether the above loop could be done in parallel, the order of the loopwas reversed to test for data dependency. • The results were identical there was no data dependency between the dosemetric calculations for each isotope. shih@UHCL.edu
DMETRIC Routine - Dependent? • To determine if loop 5 is parallelizable, the outer loop was firstchanged to decrement from II to 1 rather than from 1 to II. The results were identical outer loop of loop 5 should be parallelizable. • Next the inner loop was changed to decrement from IJ to 2 rather than from 2 to IJ. Differences appear in the last significant digit (see next page). These differences are due to floating point rounding differences during four summations. shih@UHCL.edu
DMETRIC Routine - Not Dependent • Minor results difference changing order of inner loop of loop 5 shih@UHCL.edu
Parallel DMETRIC Routine • Since there is no data dependecy in the dosemetric calculations for each of the 59 isotopes, these computations could be done in parallel. • Statements (using MPI's wall-time function: MPI_WTIME) were inserted to measure the amount of time spent in each subroutine. • Approximately 17% of the processing time is spent in subroutine DMETRIC while about 82% of the processing time is spent in subroutine PRPGT and less than 1% of the processing time is spent in the remainder of the program. • Assuming infinite parallelization of DMETRIC, the maximum speedup obtained would be up to 17%. shih@UHCL.edu
PRPGT Routine • PRPGT - propagate GCR's through the shielding and the target. • ~ 82% of HZETRN processing is spent in PRPGT or routines it calls. • At each propagation step from one depth to the next in the shield or target, the propagation for each of the 59 isotopes is performed in two stages: • The first stage computes the energy shift due to propagation • The second stage computes the attenuation and the secondary particle production due to collisions • To test whether the propagation for each of the 59 ions could be done in parallel, the loop was broken up into four pieces (a J loop from 20 to 30, from 1 to 19, from 41 to 59, and from 31 to 40). • If the loop can be performed in parallel, then the results from these four loops should be the same as the single loop from 1 to 59. shih@UHCL.edu
PRPGT Routine - Check Dependency • The following compares the results of breaking up main loop into four loops (on the left) with the original results. • Significant different results demonstrate that the propagation can not be parallelized for each of the 59 ions. shih@UHCL.edu
PRPGT Routine - Data Dependent • Identical to original results reversing inner 1st and 2nd stage I loops possible to parallelize the 1st or 2nd stages. • However, to test data dependence from the 1st stage to the 2nd stage, the main J loop was divided into two loops (one for the 1st stage and one for the 2nd stage) • Results changed the 2nd stage is dependent on the 1st stage • A barrier to prevent execution of the 2nd stage until the 1st stage completes • 24% of the HZETRN processing is spent on the 1st stage while less than 2% of the time is spent on the 2nd stage. Therefore, parallel processing of both stages does not appear worthwhile. shih@UHCL.edu
Parallel PRPLI Routine • PRPLI is called by PRPGT after the 1st and 2nd stage propagation has been completed for each of the 59 isotopes. • PRPLI performs the propagation of the six light ions (ions Z < 5). • ~ 53% of total HZETRN time is spent on light ions propagation. • PRPLI propagates 45 x 6 fluence (# particles intersect a unit area) matrix (45 energy points for each of the 6 light ions) named PSI. • Analysis of the has shown that there is no data dependency among the energy grid points. • It should, therefore, be possible to parallelize the PRPLI code across the 45 energy grid points. shih@UHCL.edu
General HZETRN Recommendations • Arrays in Fortran are stored in column-order. more effecient to access in column order, rather that row-order. • HZETRN is using an old Fortran technique of alternate entry points. The use of alternate entry points is discouraged. • HZETRN uses COMMON blocks for global memory. Fortran-90 MODULES should be used instead. shih@UHCL.edu
Conclusions & Future Work • HZETRN performance, written in Fortran 77 in the early 1990's, can be improved via simple code optimizations and parallel processing using MPI • Maximum 50% speedup with current HZETRN expected • Additional performance improvements could be obtained by implementing the 3rd Order Lagrangian Interpolation routine (PHI), or the natural log (LOG) and exponential (TEXP) functions on a FPGA. shih@UHCL.edu
References • J.W. Wilson, F.F. Badavi, F. A. Cucinotta, J.L. Shinn, G.D. Badhwar, R. Silberberg, C.H. Tsao, L.W. Townsend, R.K. Tripathi, HZETRN: Description of a Free-Space Ion and Nucleon Transport Shielding Computer Program, NASA Technical Paper 3495, May 1995. • J. W. Wilson, J.L. Shinn, R. C. Singleterry, H. Tai, S. A. Thibeault, L.C. Simmons, Improved Spacecraft Materials for Radiation Shielding, NASA Langley Research Center. spacesciene.spaceref.com/colloquia/mmsm/wilson_pos.pdf • NASA Facts: Understanding Space Radiation, FS-2002-10-080-JSC, October 2002. • P. S. Pacheco, Parallel Programming with MPI, Morgan Kaufmann Publishers Inc.: San Francisso, 1997. • S. J. Chapman, Fortran 90/95 for Scientists and Engineers, 2nd edition. McGraw Hill: New York, 2004. • L. Shih, S. Larrondo, K. Katikaneni, A. Khan, T. Gilbert, S. Kodali, A. Kadari, HIgh Performance Martian Space Radiation Mapping, NASA/UHCL/UH_ISSO, pp. 121-122. • L. Shih, Efficient Space Radiation Computation with Parallel FPGA, Y2006 – ISSO Annual Report, pp. 56-61. • Gilbert, T. and L. Shih. "High-Performance Martian Space Radiation Mapping," IEEE/ACM/UHCL Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005. • Kadari, A.. S. Kodali, T. Gilbert, and L. Shih. "Space Radiation Analysis with FPGA," IEEE/ACM/UHCL Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005. • F. A. Cucinotta, "Space Radiation Biology," NASA-M. D. Anderson Cancer Center Mini-Retreat, Jan. 25, 2002 <http://advtech.jsc.nasa.gov/presentation_portal.shtm>. • Space Radiation Health Project, May 3, 2005, NASA-JSC, March 7, 2005 <http://srhp.jsc.nasa.gov/> shih@UHCL.edu
Acknowledgements • NASA LaRC -Robert C. Singleterry Jr, PhD • NASA JSC/CARR PVA&M -Premkumar B. Saganti, PhD • TeraGrid, TACC • TLC2 -Mark Huang & Erik Engquist • Texas Space Grant Consortium ISSO Thank You ! Shih@UHCL.edu shih@UHCL.edu