230 likes | 343 Views
Dynamic Data Driven Applications Systems. Joel Saltz Chair and Professor Biomedical Informatics Department The Ohio State University. Parameter Study Application Scenarios. Clinical imaging studies Determine tumor characteristics via segmentation, texture analysis of medical imagery
E N D
Dynamic Data Driven Applications Systems Joel Saltz Chair and Professor Biomedical Informatics Department The Ohio State University
Parameter Study Application Scenarios • Clinical imaging studies • Determine tumor characteristics via segmentation, texture analysis of medical imagery • Test and refine algorithms by invoking test algorithms on distributed datasets of >1000 dynamic contrast MR studies • Simulation parameter studies • 1000’s of oil reservoir simulations used to determine how to optimize oil production
Parameter Study Data Analyses • Compare dataset contents • Compare features • Spatially based comparisons • Map datasets between mesh/coordinate systems MicroCT Osteoporosis Study Kim Powell, Cleveland Clinic Don Stredney, OSC
Canonical Services • Component Framework for Combined Task/Data Parallelism • Data Aggregation, generalized reductions • Crucial and ubiquitous in data analysis • Integrated with Globus/NWS/SRB etc (NPACkage); OGSA integration underway • Canonical services carried out by Data Parallel Components • Data Cluster/Decluster/Spatial Indexing/Range Query Service (Inherited from Active Data Repository) • Super-Semantic Data Cache – when carrying out parameter studies, use caching to eliminate redundant computations (Andrade SC2002) • Grid Generalized Reduction (Ferreira ICS2002
Clinical Studies using Dynamic Contrast Imaging • 1000s of dynamic images per research study • Iterative investigation of image quantification, image registration and image normalization techniques • Assess techniques’ ability to correctly characterize anatomy and pathophysiology • “Ground truth” assessed by • Biopsy results • Changes in tumor structure and activity over time with treatment • Images from many sites including NIH, Heidelberg, Oklahoma, Ohio State • Collaboration with Michael Knopp, MD
prior to therapy 1370 1370 after 2 cycles 1421 1421 1421 after 4 cycles 1438 1438 Knopp M, OSU Radiology / dkfz
DCE-MR Analyses • Fit pharmacokinetic model ODEs • Tumor characterization using texture analysis and feature detection techniques • Register images from consecutive studies • Register images within single time dependent study to correct for patient motion • Images obtained with varying time/space resolution -- interpolate onto common time/space mesh
A Data Intense Challenge: The Instrumented Oilfield of the Future Participants: • University of Texas at Austin • CSM: Wheeler, Dawson, Peszynska • IG: Sen, Stoffa • PGE: Torres-Verdin • University of Chicago—CS: Stevens, Papka • University of Maryland—CS: Sussman • Ohio State—CS: Saltz, Kurc • Rutgers—ECE: Parashar • MIT—Engineering: Haines
A Data Intense Challenge: The Instrumented Oilfield of the Future • Industrial Support (Data): • British Petroleum (BP) • Chevron • International Business Machines (IBM) • Landmark • Shell • Schlumberger
Production Simulation via Reservoir Modeling Monitor Production by acquiring Time Lapse Observations of Seismic Data Revise Knowledge of Reservoir Model via Imaging and Inversion of Seismic Data Modify Production Strategy using an Optimization Criteria
m realizations Model 1 … Model 2 Geostatistics … Model n Well Pattern 1 Well Pattern 2 Production Strategies … Well Pattern p IPARS Storage Systems DataCutter Client AVG DIFF DIFF DIFF DIFF Transparent Copies SUM SUM SUM SUM Filters Transparent Copies (one copy per node) RD RD …….. Node 20 Node 1 Example Scenario (SC2001)
Software Support • Component Framework for Combined Task/Data Parallelism • Use defines sequence of pipelined components -- “filter group” • User directive tells preprocessor/runtime system to generate and instantiate copies of filters • Many filter groups can be simultaneously active • Integration proceeding with Globus/Network Weather Service • SC 2002, HCW2002, Parallel Computing 2001
DataCutter • Components: • Embarrasingly Parallel • Generalized Reduction • Wrapped MPI • Flow control between components • Schedulers place filters on grid processors (scheduler API) • Stream based communication – being upgraded to OSGA model • Data Parallel Compiler Prototype • NPACkage
Integrating DataCutter with existing Grid toolkits SRB (done), Globus, NWS (ongoing) • SRB integration: Subset and filter datasets • Globus integration: DataCutter uses Globus’ resource discovery, resource allocation, authentication, and authorization services. • Network Weather Service (NWS) integration: NWS for used for system monitoring.
Cannonical Services • Canonical services carried out by Data Parallel Components • Data Cluster/Decluster/Spatial Indexing/Range Query Service (Inherited from Active Data Repository) • Super-Semantic Data Cache (Andrade SC2002) • Grid Generalized Reduction (Ferreira ICS2002)
Clustering/Declustering Datasets • Partition dataset into data chunks -- each chunk contains a set of data elements • Each chunk is associated with a bounding box • DataCutter Data Loading Service • Distributes chunks across the disks in the system • Constructs an R-tree index using bounding boxes of the data chunks Disk Farm
Advantage of Using Cached Intermediate Results (Virtual Microscope)
Other Biomedical Grid Applications Grid based clinical research support 1000’s of clinical research sites Different studies involve different subsets of sites Ad-hoc federated databases Lots of data naming issues Support for anonymization Role based data access Support for authentication, encryption Support for image analysis NCI Cancer Center Support Virtual Microscope versus query images
DataCutter Development Group Ohio State University Joel Saltz Tahsin Kurc Umit Catalyurek Gagan Agrawal Renato Ferreira University of Maryland Alan Sussman Henrique Andrade Christian Hansen