1 / 10

Extreme Scale Analytics on Spatio -Temporal Datasets

Extreme Scale Analytics on Spatio -Temporal Datasets. Joel Saltz Center for Comprehensive Informatics & Biomedical Informatics Department Emory University. Morphometric Image Analysis Pipeline. Preprocessing: normalization, tiling, etc. Segmentation: identify nuclei as objects

rendor
Download Presentation

Extreme Scale Analytics on Spatio -Temporal Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extreme Scale Analytics on Spatio-Temporal Datasets Joel Saltz Center for Comprehensive Informatics & Biomedical Informatics Department Emory University

  2. Morphometric Image Analysis Pipeline • Preprocessing: normalization, tiling, etc. • Segmentation: identify nuclei as objects • Feature Extraction: compute morphometric features • Classification: unsupervised learning (k-means) after patient-level aggregation and analysis

  3. Satellite Data Analysis for Monitoring and Change Analysis

  4. Subsurface Reservoir Management • Numerical models of porous media • Fluids flow from one region of reservoir to another region • Rock and sediment properties change over time • Simulate multiple realizations of multiple models and management strategies • Evaluate geologic uncertainty and management strategies simultaneously • Enable on-demand exploration and comparison of multiple scenarios

  5. Core Operation Categories and Patterns

  6. Challenges • Spatial-temporal disk-resident, on-the-fly, dynamically updated datasets • Access and manipulate multiple datasets generated and stored on multiple, distributed systems • Analysis of raw data can generate millions to trillions of features (e.g., millions of cells and nuclei in high resolution tissue images) to be mined and compared • Take advantage of hardware platforms for analysis • Clusters containing hybrid CPU-GPU nodes • Extreme scale machines consisting of hundreds of thousands of CPU cores • Systems with deep memory and storage hierarchies • Cloud computing platforms

  7. Using Hybrid CPU-GPU Systems

  8. Data Structures: Region Templates • Describe 2D/3D static and temporal regions. • Provides a container for points, arrays, regions, and object sets within a spatial and temporal bounding box. • A region template can represent collections of spatial areas and objects where these entities vary from one another in size and shape; e.g. regions generated by segmenting cells in microscopy images, man-made structures or hurricanes in satellite imagery. • Primary datasets are defined as point data elements and arrays, and derived datasets as sets of regions and objects. • Region templates may be related to one another in a defined manner.

  9. Programming Abstractions and Runtime Middleware Services • Programming abstractions • Multi-level dataflow pipelines • MapReduce style programs • Spatial query capabilities • I/O and Storage Services • Indexing and metadata management for ensembles of datasets • I/O support for retrieving data from multiple storage systems and for streaming data • Query capabilities • Memory Management • Careful management and staging of large data structures across memory hierarchies. Masking data movement costs with computation. • Execution Services • Distributing and rearranging computations and data to minimize data movement • Coordinated scheduling and mapping of analysis operations to heterogeneous and hybrid (CPU cores and GPUs) systems to increase overall application throughput • Quality of service/data requirements • Function variants • Provenance Tracking, Fault-detection and tolerance

  10. End

More Related