550 likes | 676 Views
Systems Support for Manipulating Large Scientific Datasets. Research/Development Group Tahsin Kurc Umit Catalyurek Renato Ferreira Matthew Spencer Mike Gray. Joel Saltz Biomedical Informatics Department The Ohio State University. Active Data Repository (ADR).
E N D
Systems Support for Manipulating Large Scientific Datasets Research/Development Group Tahsin Kurc Umit Catalyurek Renato Ferreira Matthew Spencer Mike Gray Joel Saltz Biomedical Informatics Department The Ohio State University
Active Data Repository (ADR) • A C++ class library and runtime system • targets storage, retrieval and manipulation of multi-dimensional datasets on parallel machines. • Several services are customizable for various application specific processing. • Runtime system support for common operations • data retrieval, memory management, and scheduling of processing across a parallel machine. • Data declustering/clustering, Indexing • Buffer management and scheduling of operations • Efficient execution of queries on a parallel machine
Some ADR applications Pathology Satellite Data Visualization Earth Systems Science
ADR Applications • Titan • a parallel database server for remotely sensed satellite data • Virtual Microscope • a data server for digitized microscopy images • browsing, and visualization of images at different magnifications • Bays and Estuaries Simulation System • Water contamination studies • Hydrodynamics simulator is coupled to chemical transport simulator
ADR Applications (cont’d) • VOL-Vis • a visualization system for ray-casting based volume rendering of out-of-core astronomy datasets • uses MPIRE (from SDSC) functions for rendering operations. • TM-Vis • a dataset server for browsing and visualizing Landsat Thematic Mapper scenes • ISO-Vis • iso-surface rendering of very large 3D datasets from subsurface reactive transport simulations
Data Processing Loop result data elements intermediate data elements (accumulator elements) reduction function source data elements
Initialization Phase defined by some spatial relationship accumulator Reduction Phase could be expensive order independent Output Handling Phase Data Processing Loop DU = Select(Output, R); DI = Select(Input, R); for (ue in DU) { read ue; ae = Initialize(ue); A = ae;} for (ie in DI) { read ie; SA = Map(ie, A); for (ae in SA) { ae = Aggregate(ie,ae); } } for (ae in A) { ue = Finalize(ae); write ue; }
ADR System Architecture • Front-end: the interface between clients and back-end. Provides support for clients: • to connect to ADR, • to query ADR to get information about already registered datasets and user-defined methods, • to create ADR queries and submit them. • Back-end: data storage, retrieval, and processing. • Distributed memory parallel machine or cluster of workstations, with multiple disks attached to each node • Customizable services for application-specific processing • Internal services for data retrieval, resource management
ADR ARCHITECTURE Client Client Request Front End Output Application Front End Query Submission Service Query Interface Service Store Input Data in ADR Query Execution Service Query Planning Service Dataset Service Indexing Service Data Aggregation Service Attribute Space Service ADR Back End Customizable ADR Services Customized using application specific functions
ADR Internal Services • Query interface service • receives queries from clients and validates a query • Query submission service • forwards validated queries to back end • Query planning service • determines a query plan to efficiently execute a set of queries based on available system resources • Query execution service • manages system resources and executes the query plan generated. • Handling Output • Write to disk, or send to the client using Unix sockets, or Meta-Chaos (for parallel clients).
ADR Customizable Services • Developed as a set of modular services in C++ • customization via inheritance and virtual functions • Attribute space service • manages registration and use of multi-dimensional attribute spaces, and mapping functions • Dataset service • manages datasets loaded into ADR and user-defined functions that iterate through data items • Indexing service • manages various indices for datasets loaded into ADR • Data aggregation service • manages user-defined functions to be used in aggregation operations
Developing An Application with ADR • Loading and Registering Datasets into ADR • Customizing the Front-end • Customizing the Back-end • Developing the Client
Datasets in ADR • ADR expects the input datasets to be partitioned into data chunks. • A data chunk, unit of I/O and communication, • contains a subset of input data values (and associated points in input space) • is associated with a minimum bounding rectangle, which covers all the points in the chunk. • Data chunks are distributed across all the disks in the system. • An index has to be built on minimum bounding rectangles of chunks
Loading Datasets into ADR • Partition dataset into data chunks -- each chunk contains a set of data elements • Each chunk is associated with a bounding box • ADR Data Loading Service • Distributes chunks across the disks in the system • Constructs an R-tree index using bounding boxes of the data chunks Disk Farm
ADR Front-end • Interacts with application clients and the ADR back-end • Receives requests from clients and submit queries to ADR back-end • Provides services for clients • to connect to ADR front-end • to query ADR for information on datasets and user-defined methods in ADR • to create and submit ADR queries
ADR – Back-end Customization • Indexing Service: • Index lookup functions that return data chunks given a range query. • ADR provides an R-tree index as default. • Dataset Service: • Iterator functions that return input elements (data value and associated point in input space) from a retrieved data chunk • Attribute Space Service: • Projection functions that map a point in input space to a region in output space • Data Aggregation Service: • Accumulator functions to create and tile the accumulator to hold intermediate results • Aggregation functions to aggregate input elements that map to the same output element. • Output functions to generate output from intermediate results.
Data Aggregation Service • Provides interface • to create and manipulate user-defined accumulator objects, • to implement aggregation operations, • to create output data structures, • to implement functions to convert accumulator values to the final output values. #include <t2_aggr.h> class T2_AggregateFuncObj { // Initialize accumulator elements aifElem(); aifAcc(); // aggregate input elements with accumulator elements dafElem(); dafAcc(); } T2_AggregateFuncConstructor userAggregateFunc(…);
ADR Back-end Processing Client Output Handling Phase Global Combine Phase Back-end processor Initialization Phase Local Reduction Phase
Example: Satellite Data Processing * Register attribute spaces - lat/lon/time, Goodes projection, etc. * Register mapping functions between attribute spaces Attribute Space Service Dataset Service * Partition IFOVs into data chunks * Register iterator functions to return IFOVs from chunks Indexing Service * Build an R-tree to index the IFOV chunks * Use lat/lon/time of IFOV chunks as bounding rectangles * Register functions to: - Initialize the output image - Compute vegetation index of an output pixel with a given IFOV - Select clearest vegetation index out of a set of IFOVs Data Aggregation Service
Bays and Estuaries Simulation System (Parallel Program) Visualization CHEMICAL TRANSPORT CODE (Parallel Program) FLOW CODE Grid used by chemical transport code POST-PROCESSING (Time averaging, projection) Simulation Time * Locally conservative projection * Management of large amounts of data Hydrodynamics output (velocity,elevation) on unstructured grid
Chemical Transport Code (UT-TRANS) Hydrodynamics Flow Code (PADCIRC) Query: * Time period * Number of steps * Input grid * Post-processing function (Time Averaging) Visualization * Chemical Transport Grid * Flux values on the faces Time steps Front End Hydrodynamics output (velocity, elevation) on unstructured grid Projection Code (UT-PROJ) Application Front End * Time averaged velocity and elevation values at grid points * Results returned using Meta-Chaos Query Submission Service Query Interface Service Query Execution Service Query Planning Service Dataset Service Indexing Service Attribute Space Service Data Aggregation Service ADR Back End
Bays and Estuaries Simulation System (Data Loading/Customization) Partition FLOW CODE into chunks TRANSPORT CODE Attribute spaces of simulators, mapping function POST-PROCESSING (Time averaging) Register Dataset Service Indexing Service Attribute Space Service Data Aggregation Service Chunks loaded to disks Index created
ParSSim/Meta-Chaos/ADR(SC2000 Demo) Cluster of PCs, Workstations Large-scale Machine (Blue Horizon) Meta-Chaos Reactive Transport Simulation Flow Simulation WAN Over LAN or on Same Machine Meta-Chaos ADR Visualization Output from reactive transport is stored/post-processed using ADR
Visualization with ADR Visualization Client Query: * Grid id, time steps * iso-surface value * Viewing parameters Front End 2D image Application Front End Query Submission Service Query Interface Service Store 3D Volume in ADR Query Execution Service Query Planning Service Dataset Service Indexing Service Data Aggregation Service Attribute Space Service ADR Back End Customized using VTK toolkit iso-surface rendering functions Customizable ADR Services
Visualization with ADR Iso-surface rendering of output (e.g., species concentrations) from ParSSim simulations
Virtual Microscope Client Client Client . . . Client Query: * Slide number * Focal plane * Magnification * Region of interest Front-end Virtual Microscope Front-end Image blocks Query Submission Service Query Interface Service Back-end Dataset Service Indexing Service Query Execution Service Data Aggregation Service Query Planning Service Attribute Space Service
VOL-Vis (MPIRE/ADR Visualization) • A visualization system for ray-casting based volume rendering of out-of-core astronomy datasets • uses MPIRE (from SDSC) functions for rendering operations. • Entire volume is partitioned into slabs. • Each slab is further partitioned into sub-volumes. Sub-volumes are distributed across disks attached to each processor. • Rendering of slabs and composition of partial images are pipelined • ADR renders a slab at a time, creating a partial image. • Client merges partial images received from ADR.
VOL-Vis Visualization Client Query: * Grid id * 3D Window * Viewing parameters Front End 2D image Application Front End Query Submission Service Query Interface Service Store 3D Volume in ADR Query Execution Service Query Planning Service Dataset Service Indexing Service Data Aggregation Service Attribute Space Service ADR Back End Customized using MPIRE in-core ray-casting based rendering functions Customizable ADR Services
VOL-Vis Current slab rendered in ADR Partial image generated from rendered slabs Final image
Reference DB 3D reconstruction feature list 3D reconstruction 3D reconstruction 3D reconstruction Extract ref Extract ref Computation Farm Data Server WAN View result Extract raw 3D reconstruction Sensor Extract ref View result Extract raw Raw Dataset Client PC Reference DB sensor readings Raw Dataset A Motivating Scenario Application : // process relevant raw readings // generate 3D view // compute features of 3D view // find similar features in reference db // display new view and similar cases
DataCutter • A Component-based framework for manipulating multi-dimensional datasets in a distributed environment (the Grid) • Indexing Service: Multilevel hierarchical indexes based on R-tree indexing method. • Filtering Service: Distributed C++ component framework • Application processing is implemented as a set of interacting components (filters). • filters – logical unit of user-defined computation • streams - how filters communicate • Evaluation of component-based models for data-intensive applications • Scheduling of data flow, placement decisions
Filter-Stream Programming (FSP) Purpose: Specialized components for processing data • based on Active Disks research [Acharya, Uysal, Saltz: ASPLOS’98],dataflow, functional parallelism, message passing. • filters – logical unit of computation • high level tasks • init,process,finalize interface • streams – how filters communicate • unidirectional buffer pipes • uses fixed size buffers (min, good) • manually specify filter connectivityand filter-level characteristics
Software Infrastructure • Prototype implementation of filter framework • C++ language binding • manual placement • wide-area execution service • one thread for each instantiated filter class MyFilter : public AS_Filter_Base {public: int init(int argc, char *argv[ ]) { … }; int process(stream_t st) { … }; int finalize(void) { … };}
read data decompress clip zoom view R E Ra M read dataset isosurface extraction shade + rasterize merge / view Example Applications Virtual Microscope Iso-surface Rendering
stream3 A C stream1 stream2 B Filter Connectivity / Placement [filter.A]outs = stream1 stream3[filter.B]ins = stream1outs = stream2[filter.C]ins = stream2 stream3 [placement]A = host1.cs.umd.eduB = host2.cs.umd.eduC = host3.cs.umd.edu
Directory Daemon AppExec Daemon AppExec Daemon AppExec Daemon Filter lib Directory name host port Application Console **** **** **** **** **** **** Specs ???.???.???.??? Filter/Stream dir.cs.umd.edu:6000 Placement Filter lib Filter lib Filter lib EXEC EXEC EXEC filter C filter A filter B Application Application Application host1.cs.umd.edu host2.cs.umd.edu host3.cs.umd.edu Execution Service 1. Read 2. Query 3. Exec
Batch0 Batch2 P0 F0 C0 Batch1 P1 F1 C1 Batch0 host3 (2 cpu) host2 (2 cpu) host1 (2 cpu) Group Instances (Batches) Work issued in instance batches until all complete. Matching # instances to environment (CPU capacity)
M Ra0 Ra2 Ra1 R0 host3 host1 host5 host4 R1 host1 R2 host2 Transparent Copies E0 EK host1 EK+1 Cluster 3 EN host2 Cluster 1 Cluster 2
Exploration and Visualization of Oil Reservoir Simulation Datausing DataCutter Joel Saltz, Umit Catalyurek, Tahsin Kurc Biomedical Informatics Department The Ohio State University http://medicine.osu.edu/informatics Mary Wheeler, Steven Bryant, Malgorzata Peszynska, Ryan Martino Center for Subsurface Modeling University of Texas at Austin http://www.ticam.utexas.edu/CSM Alan Sussman, Michael Beynon Department of Computer Science University of Maryland http://www.cs.umd.edu/projects/adr Don Stredney, Dennis Sessanna Interface Laboratory The Ohio Supercomputer Center http://www.osc.edu
m realizations Model 1 … Model 2 Geostatistics … Model n Well Pattern 1 Well Pattern 2 Production Strategies … Well Pattern p IPARS Storage System DataCutter Client AVG DIFF DIFF DIFF DIFF Transparent Copies SUM SUM SUM SUM Filters Transparent Copies (one copy per node) RD RD …….. Node 20 Node 1 System Architecture
Dataset • Data size = ~1.5TB • 207 simulations, selected from • 18 Geostatistics Models (GM) • 10 Realizations of each model (R) • 4 Well Patterns (WP) • Each simulation is ~6.9GB • 10,000 time steps • 9,000 grid elements • 8 scalars + 3 vectors = 17 variables • Stored on UMD Storage Cluster • 9TB disks on 50 nodes: PIII-650, 128MB, Switched Ethernet
Economic Model • Economic assessment • Net Present Value (NPV) • Return on Investment (ROI) • Sweep Efficiency (SE) • Queries • return R-WP for given GM that has NPV > avg • return R-WP for all GM which has max NPV • …. Economic model uses • well rates (time series data) • cost and price (e.g., oil) parameters
Results • Economic model shows range of winners and losers • We want to also understand the physics behind this • An example is looking at bypassed oil • Turns out to be strongly correlated to economics
Bypassed Oil (BPO) • User selects • a subset of datasets (D) and time steps (TS), • thresholds for oil saturation value (Tos) and oil velocity (Tov) • minimum number of connected grid cells (Tcc). • Query: Find all the datasets in D that have bypassed oil pockets with at least Tcc grid cells. • A cell (C) is a potential bypassed oil cell if Cos > Tos and Cov < Tov. • Algorithm for bypassed oil • Find all potential bypassed oil cells in a dataset at time step Ti • Run connected components analysis; Discard pockets with fewer cells than Tcc; Mark a cell if in a bypassed oil pocket, 0 otherwise. • Perform an AND operation on all cells over all time steps in TS. • Perform the operations in Step 2 on the result, and report back to client.