110 likes | 265 Views
PreDatA -- Preparatory Data Analytics on Peta-Scale Machines. Fang Zheng Hasan Abbasi Jianting Cao Jai Dayal Jay Lofstead Karsten Schwan Matthew Wolf CERCS Center Georgia Tech. Qing Liu Scott Klasky Norbert Podhorszki Oak Ridge National Laboratory. Ciprian Docan Manish Parashar
E N D
PreDatA -- Preparatory Data Analytics on Peta-Scale Machines Fang Zheng Hasan Abbasi Jianting Cao Jai Dayal Jay Lofstead Karsten Schwan Matthew Wolf CERCS Center Georgia Tech Qing Liu Scott Klasky Norbert Podhorszki Oak Ridge National Laboratory Ciprian Docan Manish Parashar Rutgers University
Background • “Big Data” problem for Peta-scale scientific applications • Scientists desire: • Faster I/O • Faster data analysis
Preparatory Data Analytics • Simulation output data needs to be prepared/pre-analyzed: • Indexing, annotation, reduction, sorting, layout re-organization, etc. to speedup future analysis and visualization • Latent data characterization for validation and monitoring • Preparatory data analytics can be critical for end-to-end performance of computational science discoveries • Needle hasn’t grown as fast as the haystack! Big Data
Problem • How to do preparatory data analytics? • Scalable • Efficient • Conventional Approaches: In-compute-node vs. Offline S S S S S S S S CN CN … CN CN CN CN … CN CN CN CN F F F F F F Compute Node CN Storage Storage Simulation S Pre-analytics F
PreDatA Middleware S S S S CN CN … CN CN Simulation CN CN Staging Area F F Storage
PreDatA Architecture • Asynchronous data movement with Datatap/EVPath • Pluggable pre-data analytics • User-defined operations • Higher-level Data Services • Integrated operations, separated from application codes with ADIOS Staging node Compute node Application Data Operation High Level Data Service ADIOS High-level Abstraction Data Operation Buffer Management Task Execution Data Extraction Data Movement Data Shuffling
BP file sorted array BP writer Sort Bitmap Indexing Particle array Index file Histogram Plotter 2D Histogram Plotter Driver Applications • GTC (Gyrokinetic Toroidal Code) • Output: 16384 cores outputs 260GB / 120 seconds • Pre-analytics:
Driver Applications (Cont.) • GTC@JaguarPF • Performance & Cost 98 CPU hours saved in a 30min run 1,716,960 CPU hours saved in a year! CPU Seconds = Total Simulation Time x Total Number of Cores Used 1.2~3% improvement in cost (CPU seconds)
Output Data Diagnostics Particle Diag. Toroidal flux Diag. Momentum Diag. Velocity divergence Diag. Energy Diag. Growth rate Diag. … Current Diag. Maximum velocity Diag. Visualization BP file Layout Re-organization BP writer Driver Applications (Cont.) • Pixie3D (3D MHD code) • Output: 16384 cores, 32 GB / 100 seconds • 3D domain decomposition • Pre-analytics: diagnostics + layout re-organization 10x read performance improvement through layout re-organization
Current Work Programming Interface/Runtime system to enable In-situ Workflow in Staging Area A collection of analysis operations organized as workflow Use ADIOS as coupling interface Treat analysis operations as black box Runtime system: Workflow scheduling Data movement Layout re-distribution Fault tolerance Integration with Deep analysis tools (Hadoop, Paraview/Visit) Work with real-world applications Pixie3D, GTC, GTS, LAMMPS, S3D