170 likes | 302 Views
Data-driven Query Processing for Immersive Computational Turbulence. Kalin Kanov Department of Computer Science Johns Hopkins University. The Big Picture. Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically
E N D
Data-driven Query Processing for Immersive Computational Turbulence Kalin Kanov Department of Computer Science • Johns Hopkins University
The Big Picture • Scientific disciplines have developed a computational branch • Models without closed form solutions solved numerically • This has lead to an explosion of data • Simulation and analysis workloads are data-intensive • Producing\scanning large amounts of data • Management of these data represents a significant challenge • Storage\archiving • Query processing • Visualization
Remote Immersive Analysis • Formerly, analysis performed during the computation • No data stored for subsequent examination • Data-intensive computing breakthroughs have allowed for new interaction with scientific numerical simulations • Turbulence Database Cluster • Stores entire space-time evolution of the simulation • Provides public access to world-class simulations • Implements “immersive turbulence*” approach • Introduces new challenges *E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Supercomputing, 2007.
Goals • Develop data-driven query processing techniques • Reduce I/O and computation costs • Reduce or eliminate storage overhead • Exploit domain knowledge and structure • Provide user interfaces that are efficient and flexible • Streamline the process of data ingest
Processing a Batch Query query 2 10 11 14 15 • Redundant I/O • Multiple disk seeks 8 9 12 13 2 3 6 7 0 1 4 5 query 1 query 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 6 8 9 12 q1: 9 11 12 14 q2: q3: 4 5 6 7
I/O Streaming Evaluation Method • Linear data requirements of the computation allow for: • Incremental evaluation • Streaming over the data • Concurrent evaluation of batch queries
Processing a Batch Query query 2 10 11 14 15 • Sequential I/O • Single pass 8 9 12 13 2 3 6 7 0 1 4 5 query 1 query 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 11 12 14 I/O Streaming: q1 q1 q1 q1 q1 q3 q1 q3 q1 q1 q2 q1 q2 q2 q3 q3 q2
Lagrange Polynomial Interpolation Lagrange coefficients Data
128 Workload • I/O Streaming • Each atom is read only once • Effective cache usage • Join/Order By executes entire batch as a join • Sorting leads to a more sequential acces • Over an order of magnitude improvement
I/O Streaming alleviates I/O bottleneck • Computation emerges as the more costly operation
Particle Tracking Web Server/Mediator Distribute Points based on xp(tm) xp(tm) DB Node 1 DB Node N x*p(tm) x*p(tm) Computational Module Computational Module Storage Layer Retrieve Storage Layer Retrieve
Particle Tracking Web Server/Mediator x*p(tm) x*p(tm) Distribute Points based on DB Node 1 DB Node N xp(tm+1) xp(tm+1) Computational Module Computational Module Storage Layer Retrieve Storage Layer Retrieve
Summary and Future Work • Extend I/O streaming technique to different decomposable kernel computations: • Differentiation • Spatial Interpolation • Temporal interpolation • Filtering and coarse-graining • Provide a flexible user interface • Allow for different filter functions • Allow for new kernel computations • Improve particle tracking routine • Reduce communication between mediator and DB nodes • Asynchronous processing • Caching and pre-fetching
Questions Images courtesy of Kai Buerger (buerger@tum.de)