260 likes | 405 Views
The Parallel Models of Coronal Polarization Brightness Calculation. Jiang Wenqian. Outline. Introduction pB Calculation Formula Serial pB Calculation Process Parallel pB Calculation Models Conclusion. Part Ⅰ. Introduction.
E N D
The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian
Outline • Introduction • pB Calculation Formula • Serial pB Calculation Process • Parallel pB Calculation Models • Conclusion
Part Ⅰ. Introduction • Space weather forecast needs an accurate solar wind model for the solar atmosphere and the interplanetary space. The global model of corona and heliosphere is the basis of numerical space weather forecast, and the observation basis of explaining various relevant relations. • Meanwhile, three-dimensional numerical Magnetohydrodynamics (MHD) simulation is one of the most common numerical methods to study corona and solar wind.
Part Ⅰ. Introduction • Besides, calculating and converting the generated coronal electron density to the coronal polarization brightness (pB) is the key method of comparing with observation results, and is important to validate the MHD models. • Due to the massive data and the complexity of the pB model, the computation will cost too much time to visualize the pB data in nearly real time while using a single CPU (or core).
Part Ⅰ. Introduction • According to the characteristic of CPU/GPU computing environment, we analyze the pB conversion algorithm, implement two parallel models of pB calculation with MPI and CUDA, and compares the two models’ efficiency.
Part Ⅱ. pB Calculation Formula • pB is derived from electron-scattered photosphere radiation. It can be used in the inversion of coronal electron density and to validate numerical models. Taking limb darkening into account, pB calculation formula of a small coronal volume element is shown as followed : (1) (2) (3)
Part Ⅱ. pB Calculation Formula • The polarization brightness image for comparing with the observation of coronagraph can be generated through integrating the electron density along the line of sight. Density integral Process of pB Calculation
Part Ⅲ. Serial pB Calculation Process • The steps of the serial model of pB calculation on CPU with the experimental data are shown as below. The serial process of pB calculation
Part Ⅲ. Serial pB Calculation Process • According to the serial process of pB calculation above, we implement it under the environment of G95 on Linux and Visual Studio 2005 on Windows XP respectively. • With being measured the time cost of each step, it is found that the most time-consuming part of the whole program is the calculation of pB values, accounting for 98.05% and 99.05% of the total time cost respectively.
Part Ⅲ. Serial pB Calculation Process • Therefore, in order to improve the performance to meet the command of getting coronal polarization brightness in nearly real-time, we should optimize the calculation part of pB values. • As the density integration of each point over solar limb along the line of sight is independent, the parallel computation method is very suitable for pB calculation.
Part Ⅳ. Parallel pB Calculation Models • Currently, parallelized MHD numerical calculation is mainly based on MPI. • With the development of high performance computation, using GPU architecture to solve intensive computation shows obvious advantages. • Based on this situation, it will be an efficient parallel solution to implement the parallel MHD numerical calculation using GPU. • We implement two parallel models based on MPI and CUDA respectively.
Part Ⅳ. Parallel pB Calculation Models • Experiment Environment • Experimental Data • 42×42×82(r, θ, φ) density data(den) • 321×321×481(x , y, z) cartesian coordinate grid • 321×321 pB values will be generated. • Hardware • Intel(R) Xeon(R) CPU, E5405 @ 2.00GHz(8 CPUs) • 1GB memory • NVIDIA Quadro FX 4600 GPU, 760MB Global Memory GDDR3 SDRAM graphics card (It owns G80 kernel architecture, 12 MPs and 128 SPs )
Part Ⅳ. Parallel pB Calculation Models • Experiment Environment • Compiling Environment • CUDA-based parallel model • Visual Studio 2005 on Windows XP • CUDA 1.1 SDK • MPI-based parallel model • G95 on Linux • MPICH2
Part Ⅳ. Parallel pB Calculation Models • MPI-based Parallelized Implementation • In the MPI environment, how the experiment decomposes computing domain into sub-domains is shown as bellow.
Part Ⅳ. Parallel pB Calculation Models • MPI-based Parallelized Implementation
Part Ⅳ. Parallel pB Calculation Models • MPI-based Parallelized Implementation • The final result shows that MPI-based parallel model reaches a speedup of 5.8. As the experiment is implemented under the platform with 8 CPU cores, the speed-up ratio of the result is closed to its theoretical value. • Meanwhile, it is revealed that the MPI-based parallel solution for the experiment has balanced the utilization ratio of processors and the communication between processors.
Part Ⅳ. Parallel pB Calculation Models • CUDA-based Parallelized Implementation • According to pB serial calculation process and the CUDA architecture, we should put the calculation part into the Kernel function to implement the parallel program. • Since the calculation of density interpolation and the cumulative sum involved in every pB value are independent, we can use multi-threads to process the pB value calculation in the CUDA, and each thread calculates one pB value.
Part Ⅳ. Parallel pB Calculation Models • CUDA-based Parallelized Implementation • However, the pB values to be calculated is much larger than the available thread number of GPU, so each thread should calculate multiple pB values. According to experimental conditions, the thread number is setting to 256 for each block so as to maximize the use of computing resources. • The block number depends on the ratio of pB number and thread number. In addition, since the access time of global memory is large, we can put some independent data to the shared memory to reduce data access time.
Part Ⅳ. Parallel pB Calculation Models • CUDA-based Parallelized Implementation • The size of data put into shared memory is about 7KB, less than 16KB provided by GPU, so the parallel solution is feasible. • Moreover, the data-length array is read-only and its using frequency is very high, so the optimized strategy that the data-length array is migrated from shared memory into constant memory is adopted to further improve its access efficiency. • The CUDA-based parallel calculation process is shown as bellow.
Part Ⅳ. Parallel pB Calculation Models • Experiment results • The pB calculation time of two models is shown in Table 1. Table 1. The pB calculation time of serial models and parallel models and their speed-up ratio
Part Ⅳ. Parallel pB Calculation Models • Experiment results • The total performance of two models is as shown in Table 2. Table 2. The total running-time of two parallel models and the speed-up ratios compared with their serial models
Part Ⅳ. Parallel pB Calculation Models • Experiment results • Finally, we draw the coronal polarization brightness image shown as bellow with using calculated data.
Conclusion • Under the same environment, pB calculation time of MPI-based parallel model costs 5.053 seconds while the serial model costs 32.403 seconds. The model’s speedup is 6.41. • The pB calculation time of CUDA-based parallel model costs 1.536 seconds while the serial model costs 48.936 seconds. The model’s speedup is 31.86. • The total running-time of CUDA-based model is 2.84 times than that of MPI-based model.
Conclusion • It finds that the CUDA-based parallel model is more suitable for pB calculation, and it provides a better solution for post-processing and visualizing the MHD numerical calculation results.