310 likes | 416 Views
Integration of Reverse Monte-Carlo Ray Tracing within Uintah. Todd Harman Department of Mechanical Engineering Jeremy Thornock Department of Chemical Engineering. Isaac Hunsaker. Graduate Student Department of Chemical Engineering. Deliverables.
E N D
Integration of Reverse Monte-Carlo Ray Tracing within Uintah Todd Harman Department of Mechanical Engineering Jeremy Thornock Department of Chemical Engineering Isaac Hunsaker Graduate Student Department of Chemical Engineering
Deliverables • Year 2: Demonstration of a fully-coupled problem using RMCRT within ARCHES. Scalability demonstration.
Approach CFD: Finest level, (always) RMCRT: 1 Level: CFD 2 Level: coarsest level “Data Onion”: finest level, Research Topic: Region of Interest (ROI)
2 Levels 2 Levels
Data Onion 3-Levels
Data Onion: ROI Implemented Research Topic: ROI location? Static: • User defined region?
Data Onion: ROI Implemented Research Topic: ROI location Dynamic: • ROI computed every timestep? (abskg sigmaT4) • ROI proportional to the size of fine level patches?
Status: Completed • 80% Complete: Data Onion, dynamic & static region of interests. Testing phase, need benchmarks. • 90% Complete: Integration of RMCRT tasks within ARCHES (2 level)
Status: Work in Progress • Single Level Verification Order of accuracy # rays (old) grid resolution Scalability studies, new mixed scheduler. • 2 Levels verification Errors associated with coarsening
Benchmark Problem Initial Conditions: - Uniform temperature field - Analytical function for absorption coefficient S. P. Burns and M.A Christon. Spatial domain-based parallelism in large-scale, participating-media, radiative transport applications. Numerical Heat Transfer, Part B, 31(4):401-421, 1997.
Verification: 1L S. P. Burns and M.A Christon. Spatial domain-based parallelism in large-scale, participating-media, radiative transport applications. Numerical Heat Transfer, Part B, 31(4):401-421, 1997.
Verification: 2L 4X error from coarsening abskg
Verification: 2L Error Coarsening: smoothing filter Abskg
Collaboration Leverage the work of Dr. Berzin’s team Hybrid MPI-threaded Task Scheduler (Qingyu Meng) GPU-RMCRT (Alan Humphrey)
Hybrid MPI-threaded Task Scheduler • Hybrid MPI-threaded Task Scheduler*: • Memory reduction! • 13.5Gb -> 1GBper node (12 cores/node)*. • (2 material CFD problem, 20483 cells, on 110592 cores of Jaguar) • Interconnect drivers and MPI software must be threadsafe. • RMCRT requires anMPI environmental variable expert! *Q. Meng, M. Berzins, and J. Schmidt, Using hybrid parallelism to improve memory use in uintah. In Proceeding of the Teragrid 2011.
MPI-threaded Task Scheduler Kraken 100 rays per cell
MPI-threaded Task Scheduler Difficult to run on Kraken, crashing in mvapich Further testing needed on bigger machines?
GPU-RMCRT Nvidia M2070/90 Tesla GPU Keeneland Initial Delivery System 360 GPUs DoE Titan 1000s of GPUs + Multi-core CPU Motivation - Utilize all available hardware • Uintah’s asynchronous task-based approach is well suited to take advantage of GPUs • RMCRT is ideal for GPUs
GPU-RMCRT • Offload Ray Tracing and RNG to GPU(s) • Available CPU cores can perform other computation. • Uintah infrastructure supports GPU task scheduling and execution: • Can access multiple GPUs on-node • Uses Nvidia CUDA C/C++ • Using NVIDIA cuRAND Library • GPU-accelerated random number generation (RNG)
Uintah Hybrid CPU/GPU Scheduler • Create & schedule CPU & GPU tasks • Enables Uintah to “pre-fetch” GPU data • Uintah infrastructure manages: • Queues of CUDA Stream and Event handles • Device memory allocation and transfers • Utilize all available: CPU cores and GPUs
Uintah GPU Scheduler Abilities • Capability jobs run on: Keeneland Initial Delivery System (NICS) 1440 CPU cores & 360 GPUs simultaneously Jaguar - GPU partition (OLCF) 15360 CPU cores & 960 GPUs simultaneousl • Development of GPU RMCRT prototype underway.
Status: Pending • Head-to-head comparison of RMCRT with Discrete Ordinates Method. Single level. Accuracy versus computational cost. • 2 Levels: Coarsening error for variable temperature and radiative properties. • Data Onion: Serial performance Accuracy versus number of levels, refinement ratio, dynamic/static ROI. Scalability Studies
Summary Summary • Order of Accuracy: # rays0.5 , grid Cells1 • Accuracy issues related to coarsening data. • Cost = f( #rays, Grid Cells1.4-1.5 communication….) Doubling the grid resolution = 20ish X increase in cost. • Good scalability characteristics Year 2: Demonstration of a fully-coupled problem using RMCRT within ARCHES. Scalability demonstration.
GPU RMCRT Acknowledgements: DoE for funding the CSAFE project from 1997-2012, DOE NETL, DOE NNSA, INCITE NSF for funding via SDCI and PetaApps Keeneland Computing Facility, supported by NSF under Contract OCI-0910735 Oak Ridge Leadership Computing Facility – DoE Jaguar XK6 System (GPU partition) http://www.uintah.utah.edu
Physics Isotropic scattering added to the model Verification testing performed using an exact solution (Siegel, 1987) Grid convergence analysis performed Discrepancy diminishes with increased mesh refinement
Isotropic Scattering:Verification • Benchmark Case of Seigel 1987 • Cube (1m3) • Uniform Temperature 64.7K • Mirror surface on all sides • Black top and bottom walls • Computed surface fluxes on top & bottom walls • 10 rays per cell (low) Seigel, R. “Transient Radiative Cooling of a droplet-filled layer,” ASME Journal of Heat Transfer,109:159-164, 1987.
Isotropic Scattering:Verification Radiative Flux vs Optical Thickness RMCRT (dots) Exact solution (lines) Seigel, R. “Transient Radiative Cooling of a droplet-filled layer,” ASME Journal of Heat Transfer,109:159-164, 1987.
Isotropic Scattering:Verification Grid convergence of the L1 error norms where the scattering coefficient is 8 m-1, and the absorption coefficient is 2m-1.
DOM vs RMCRT • IFRF burner simulation (production size run) • 1344 processors/cores • Initial conditions taken from a previous run with DOM. • Domain: (1m x 4.11 m x 1m) • Resolution: (4.4mm x 8.8mm x 4.4mm) 24 million cells