160 likes | 287 Views
Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology. Outline. Cloud Computing & Cloud Workflow Systems Introduction to cloud workflow systems. A brief overview of grid workflow systems.
E N D
Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology
Outline • Cloud Computing & Cloud Workflow Systems • Introduction to cloud workflow systems. A brief overview of grid workflow systems. • Data Management in Cloud Workflow Systems • New features and research issues • Cloud Computing Environment and SwinDeW-C • Our simulation environment and cloud workflow system
Cloud Computing • Some new features of cloud computing • Large data centres with cheap hardware • Virtualisation • Internet based and SOA • SaaS, PaaS, IaaS • Market driven and cost model • Research of cloud computing has emerged in many areas • Data mining, Database, Parallel computing & Scientific application, Content delivery
Cloud Workflow Systems • Grid workflow systems • Kepler, Pegasus, Taverna, MOTEUR, Triana, ASKALON • Gridbus, GridFlow • Build-time: focus on data modelling. • Kepler: actor-oriented data modelling. Taverna - Sculf. ASKALON - AGWL • Runtime: adopt Data Grid system • Grid DataFarm, GDMP, GridDB, SRB, RLS (P-RLS), GSB, DaltOn
Cloud Workflow Systems • Architecture • Based on Internet • Platform as a Service • More distributed
Data Management in Cloud Workflow Systems • New features and challenges • Independent of users and automatic • Cost driven • computation cost, storage cost, data transfer cost • Data dependency • Task – data, data – data, derivation • Some research issues • Data partition, placement, replication, synchronisation, provenance, catalogue, meta-data, consistence, reduction, storage, movement, etc.
Data Placement in Cloud Workflow Systems • Data Placement: to decide where to store the application data in the distributed data centres • Aims: • Reduce data movement • Reduce task waiting time • Strategy: • Data dependency: dataset – dataset • Build-time: existing data, runtime: generated data (also intermediate data)
Data Replication in Cloud Workflow Systems • Data replication: for one dataset, store several copies in different places (data centres) • Aims: • Increase data security • Fast data access • Reduce data movement • Strategy: • Dynamic replication.
Intermediate Data Storage in Cloud Workflow Systems • Intermediate data storage is especially importance in scientific workflows • Aim: • Reduce system cost • Strategy: • Intermediate data can be regenerated with data provenance information • Selectively store some key intermediate datasets
End • Questions? Thanks!