Data-driven Query Processing for Immersive Computational Turbulence

Data-driven Query Processing for Immersive Computational Turbulence Kalin Kanov Department of Computer Science • Johns Hopkins University

The Big Picture • Scientific disciplines have developed a computational branch • Models without closed form solutions solved numerically • This has lead to an explosion of data • Simulation and analysis workloads are data-intensive • Producing\scanning large amounts of data • Management of these data represents a significant challenge • Storage\archiving • Query processing • Visualization

Remote Immersive Analysis • Formerly, analysis performed during the computation • No data stored for subsequent examination • Data-intensive computing breakthroughs have allowed for new interaction with scientific numerical simulations • Turbulence Database Cluster • Stores entire space-time evolution of the simulation • Provides public access to world-class simulations • Implements “immersive turbulence*” approach • Introduces new challenges *E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Supercomputing, 2007.

Goals • Develop data-driven query processing techniques • Reduce I/O and computation costs • Reduce or eliminate storage overhead • Exploit domain knowledge and structure • Provide user interfaces that are efficient and flexible • Streamline the process of data ingest

Turbulence Database Cluster

Processing a Batch Query query 2 10 11 14 15 • Redundant I/O • Multiple disk seeks 8 9 12 13 2 3 6 7 0 1 4 5 query 1 query 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 6 8 9 12 q1: 9 11 12 14 q2: q3: 4 5 6 7

I/O Streaming Evaluation Method • Linear data requirements of the computation allow for: • Incremental evaluation • Streaming over the data • Concurrent evaluation of batch queries

Processing a Batch Query query 2 10 11 14 15 • Sequential I/O • Single pass 8 9 12 13 2 3 6 7 0 1 4 5 query 1 query 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 11 12 14 I/O Streaming: q1 q1 q1 q1 q1 q3 q1 q3 q1 q1 q2 q1 q2 q2 q3 q3 q2

Lagrange Polynomial Interpolation Lagrange coefficients Data

Spatial Differentiation

Derivative Interpolation

128 Workload • I/O Streaming • Each atom is read only once • Effective cache usage • Join/Order By executes entire batch as a join • Sorting leads to a more sequential acces • Over an order of magnitude improvement

I/O Streaming alleviates I/O bottleneck • Computation emerges as the more costly operation

Particle Tracking Web Server/Mediator Distribute Points based on xp(tm) xp(tm) DB Node 1 DB Node N x*p(tm) x*p(tm) Computational Module Computational Module Storage Layer Retrieve Storage Layer Retrieve

Particle Tracking Web Server/Mediator x*p(tm) x*p(tm) Distribute Points based on DB Node 1 DB Node N xp(tm+1) xp(tm+1) Computational Module Computational Module Storage Layer Retrieve Storage Layer Retrieve

Summary and Future Work • Extend I/O streaming technique to different decomposable kernel computations: • Differentiation • Spatial Interpolation • Temporal interpolation • Filtering and coarse-graining • Provide a flexible user interface • Allow for different filter functions • Allow for new kernel computations • Improve particle tracking routine • Reduce communication between mediator and DB nodes • Asynchronous processing • Caching and pre-fetching

Questions Images courtesy of Kai Buerger (buerger@tum.de)

Data-driven Query Processing for Immersive Computational Turbulence

Data-driven Query Processing for Immersive Computational Turbulence

Presentation Transcript

Query Processing

Query Processing

Query Processing

Query Processing

Query- driven Data Completeness Management

Query Processing

Adaptive Query Processing for Data Aggregation:

Query Processing

Query Processing

Skyline Query Processing for Incomplete Data

Query Processing

Query Processing

Query Processing

Query processing

Query Processing of XML Data

Query Processing of XML Data

Query Processing

Skyline Query Processing for Incomplete Data

Query Processing