200 likes | 287 Views
Towards Intelligent Workflow Planning for Neuroimaging Analyses. Irfan Habib, Ashiq Anjum, Peter Bloodsworth, Richard McClatchey Centre for Complex Cooperative Systems, BIT, University of the West of England, Bristol. Introduction.
E N D
Towards Intelligent Workflow Planning for Neuroimaging Analyses Irfan Habib, Ashiq Anjum, Peter Bloodsworth, Richard McClatchey Centre for Complex Cooperative Systems, BIT, University of the West of England, Bristol
Introduction • Recent progress in neuroimaging techniques and data formats has led to an explosive growth in neuroimaging data • Analysis of this data can facilitate research in neuro-degenerative diseases.
Commercial Partners Academic Partners Clinical Users http://www.neugrid.eu
Neuroimaging datasets are generally processed through Neuroimaging pipelines
CIVET produces 1100% more data than it consumes, and intermediate data usage is more than 4000%. Without optimisation runtime of a single workflow is 8 hrs
CIVET Pipeline 85% of All Tasks in CIVET execute in less than 512 secs
CIVET Pipeline These 85% of tasks in CIVET perform just 8% of the computation
Existing Approaches • State-of-the-art approaches for workflow planning include: • Data-based Methods: Data elimination, data diffusion • Task-based Approaches: Task Clustering • Scheduling-based Approaches
Task Clustering CIVET Normalised Workflow turnaround time (with respect to standard CIVET on SGE Cluster)
Task Clustering CIVET Normalised Cumulative Data Retrieval (with respect to standard CIVET on SGE Cluster)
What are the issues? • Different clustering strategies work for different types of workflows. • A specific automated horizontal task clustering strategy created a computationally efficient workflow in this case.
What are the issues? Coarse-grained Tasks with High-level of data-interdependencies More Coarse Grained Tasks Fine-grained Tasks with Low-level of data-interdependencies Higher Data Affinity
What are the issues? • Creating an efficient workflow plan involves consideration of several trade-offs! • Various parameters need to be optimised: Data efficiency, scheduling latency, workflow turn-around time, network latencies. • Hence workflow planning is a multi-dimensional optimisation problem.
This paper proposes an initial single-objective genetic algorithm based workflow planning approach.
B1 C2 C4 C3 B2 C3
B1 B1 B1 B1 B1 B1 C4 C4 C4 C4 C4 C2 C3 C3 C3 C3 C3 C4 Enact Workflow Grid C3 Store Provenance Data B2 Provenance Storage C3 Randomly Planned User Submitted Workflows
Fitness Calculation Selection Genetic operators Pipeline Service Planner Provenance Data
Implementation of the Approach • The workflow planning approach will first be simulated in SimGRID. • Various parameters for the planning approach will be tweaked and evaluated • Type of selection producing the quickest convergence towards efficiency • Extending fitness functions for multi-objectives
Conclusion • Several workflow planning techniques exist, however prior knowledge about the nature of the workflow is required to select an appropriate technique. • This paper proposes a single-objective evolutionary workflow planning approach to optimise workflow turn-around times. • The approach will be first implemented in a SimGrid environment and results will be shared in future publications.