130 likes | 305 Views
SDM workshop Strawman report History and Progress and Goal. History. Original plan Identify Scientific Applications Data Management needs Focus on different application types: simulations, experiments/observations Identify Data Management technologies
E N D
SDM workshop Strawman report History and Progress and Goal
History • Original plan • Identify Scientific Applications Data Management needs • Focus on different application types: simulations, experiments/observations • Identify Data Management technologies • Identify other relevant Computer Science technologies • Identify Gaps, Cost, Priorities • In Extended EOC we came up with draft report • Based on extensive discussions of application needs • Identified the scientific investigation process (workflow) • Identified technologies needed • Assigned writing to individuals
Section 2:Application sciences motivation and needs • Astrophysics • Biology • Climate Modeling • Combustion • Fusion Energy Science • High Energy and Nuclear Physics • Nanotechnology
Section 3:The scientific investigation process • Distributed Scientific Workflows • Scientific Data Management Phases • Data Generation • Data Analysis • Data Visualization • Foundation of scientific data management technology • Workflow, dataflow, data transformation • Storage, data movement, grid, networks • Metadata management and cataloging • Efficient access and query, data integration • Integrated analysis environment, visualization • Requirements of supportive technologies • Networking • Visualization
Scientific Workflow Cycle Data Generation workflow workflow Scientific Data Management Data Visualization Data Analysis workflow
Section 4:Data Management Technologies and Gap Analysis 1) Workflow, dataflow, data transformation • Workflow specification • Workflow execution in distributed systems • Monitoring of long-running workflows • Adapting components to the framework Workflow layers • Control-flow layer • Application and Software Tools layer • I/O System layer • Storage and Network Resource layer
Astrophysical Simulation Workflow Cycle Application Layer Start New Simulation? Run Simulation batch job on capability system Continue Simulation? Simulation generates checkpoint files Archive checkpoint files to HPSS Migrate subset of checkpoint files to local cluster Vis & Analysis on local Beowulf cluster Parallel I/O Layer Parallel HDF5 Storage Layer PVFS or LUSTRE HPSS GPFS MSS, Disks, & OS
Section 4:Data Management Technologies and Gap Analysis 2) Storage, data movement, grid, networks • Dynamic data storage and caching • Robust terabyte-scale data movers • Dataflow automation between components • Multi-resolution data movement 3) Metadata management and cataloging • Unified data models and API’s • Annotation, ontologies and provenance • Metadata requirements for workflows
Section 4:Data Management Technologies and Gap Analysis 4) Efficient access and query, data integration • Parallel and random I/O • Large-scale feature-based Indexing • Query processing over files • Data integration 5) Integrated analysis environment, visualization • A single environment for packaged tools and user software • A single environment for a variety of tools: statistical software, cluster analysis, … • Coupling with visualization tools • Work with parallel I/O
Section 5:Prioritization, Cost, and Management • Prioritization process • Reasons based on current barriers and needs • Reasons based on long term projections • Practical budgeting considerations • Research and development • Hardening and packaging • Deployment and maintenance • Recommendations and program planning • Prioritization • Cost • Management Structure
Gap & Cost Matrix Deployment and maintenance Research and Development Hardening and Packaging • Workflow, dataflow, data transformation • Storage, data movement, grid, networks • Metadata management and cataloging • Efficient access and query, data integration • Integrated analysis environment, visualization
Discussion items Deployment and maintenance Research and Development Hardening and Packaging • Control flow tier • Granularity of tasks, sub-workflows • Task Invocation mechanisms-Web Services, Corba, Wrappers, Callbacks • Human tasks: Notifications and alerts, steering • Dataflow streaming granularity • Work Tier • Workflow engine for scientific applications • Dataflow management • Effect of dataflow on the control flow • Failure detection and recovery • Performance and bottleneck issues