370 likes | 594 Views
General and Effective Monetary Optimizations for Workflows in IaaS Clouds. presented by. Amelie Chi Zhou amelie.czhou@gmail.com Xtra Computing Group http:// pdcc.ntu.edu.sg/xtra Nanyang Technological University, Singapore. Workflows for Scientific Applications .
E N D
General and Effective Monetary Optimizations for Workflows in IaaS Clouds presented by Amelie Chi Zhou amelie.czhou@gmail.com Xtra Computing Group http://pdcc.ntu.edu.sg/xtra Nanyang Technological University, Singapore
Workflows for Scientific Applications • Workflows are structured • Tasks have very different I/O and computational behavior. • Real-world workflows • Montage, Ligo, Epigenomics, water-simulation • Workflow ensembles [Malawski et al., SC’12] • Composition of workflows with similar structures and different parameters and priorities Epigenomics Ligo Montage
Running Workflows on IaaS Clouds • Define IaaS clouds • Provide fundamental computing resources for users to provision • Examples: Amazon EC2, Rackspace, OpenStack, Google Compute Engine … • Example projects • Montage, Broadband, Epigenomics on Amazon EC2 [Juve et al., eScience’09] • Astronomy applications on Nimbus, Eucalyptus, and EC2 [Vöckler et al., ScienceCloud’11] • …
Workflows in IaaS Clouds • Features of IaaS clouds • Pay as you go (e.g., hourly pricing scheme) • Rich and evolving cloud offerings • Research problems • Monetary cost optimizations • Performance optimizations • Elasticity • Fault tolerance • … Are the current solutions ideal/sufficient?
Monetary Cost Opportunities • Instance types • Amazon EC2 provides 29 types of instances • Instance reuse • Hourly charging scheme • Pricing schemes • On-demand, spot and reserved pricing V.S. • Tasks can have very different I/O and computational behavior. • Workflows have different deadline and monetary constraints. • Users may have various workflow application scenarios.
Current Solutions are Far From Ideal • Problems of current approaches • Auto-scaling [Mao et al., SC’11] resource management • More effective optimizations 29% less cost • Assume static cloud performance and pricing • Cloud dynamics + spot instances 73% less cost • Heuristic-based cost and performance optimizations are specific. • They are likely to be suboptimal in evolving and diversified workflow applications. 29% 73%
Our Research Efforts • Effectiveness • Dyna: Minimize the monetary cost of workflows, addressing both the price and performance dynamics in clouds • Generality • ToF: Define transformation operations to model common cost and performance optimizations • Deco: Design a declarative language called WLog to specify various workflow optimization problems The focus of this presentation.
Overall Design • We design general workflow optimization frameworks to fully explore the optimization opportunities that lie in workflows Problem specification layer Wlog programs Deco Transformation-based Optimizer Optimization layer ToF Execution layer
Outline • Related Work • Generalized Optimization Frameworks • General transformations for cost and performance optimizations • A declarative language for workflow optimization problems • Conclusions
Related Work • Performance and monetary cost optimization heuristics • Auto-scaling [Mao et al., SC’11] • Fixed sequence of workflow optimizations • Workflow scheduling with performance and cost constraints [Kllapi et al., SIGMOD’11] • Consider only one on-demand instance type The heuristics are specifically designed for specific optimization problems and the optimization opportunities are not fully explored.
Related Work (cont’d) • Generalized optimization frameworks: overhead is a problem • Generalized bin-ball abstraction for resource allocation [Rai et al., SoCC’12] • GPU acceleration • Not always convenient to model a problem with the bin-ball model • Declarative language to model a wide range of COPs [Liu et al., VLDB’12] • Distributed systems • Ignorant to the special features and optimization opportunities in workflows There is no general optimization framework for workflows.
Outline • Related Work • Generalized Optimization Frameworks • General transformations for cost and performance optimizations • A declarative language for workflow optimization problems • Conclusions
ToF: A Transformation-based Optimization Framework • Outline • Main contributions of this work • System overview • Design details • Evaluation results
Main Contributions • This study has two major contributions • We define a series of common transformations for the performance and cost optimizations of workflows. • We design a light-weight optimizer to guide the transformation process.
Workflow Transformation • Definitions • Instance assignment graph • Each node represents instance configuration for a task. • Same structure as the workflow DAG • Transformation operation • Structural change in the instance assignment graph 0 0 0 Transformations 1 2,3 1,2 3 2 3 1 0 0 2 1,3 1,2,3
System Overview • Design ideas • Two types of transformations • Main schemes: reduce cost • Auxiliary schemes: help main schemes to reduce cost • Use cost model to guide the transformation optimization • Periodical batch optimization • Maximize instance sharing and reuse • Reduce optimizer overhead Main Schemes Cost model Auxiliary Schemes No Termination? Yes Output Optimization process in one plan period
Design Details • Transformation operations • Main schemes: Merge, Demote • Auxiliary schemes: Move, Promote, Split, Co-scheduling • Transformations can combine with each other
Using Transformations • Example of using Move and Merge operations Only transform shape Reduces cost Charging hours:
Experimental Setup • Workload • Montage, Ligo and Mixed • Workflow submission rate follows Poisson distribution • Comparisons • ToF • Baseline: only implementthe initial instance configuration • Auto-scaling [Mao et al., SC’11] • Greedy: randomly select the transformation during optimization • All results are normalized to Baseline
Evaluation Results on Cost Optimizations 29% 15% 28% 21% 17% 16% • Optimization results under the pricing scheme of Amazon EC2. • ToF obtains the lowest monetary cost on all workflows. • Over Auto-scaling by 29% • Over Baseline by 27% • Over Greedy by 17%
Evaluation Results on Performance Optimizations 12% 21% 21% 18% 8% 16% • Performance optimization results. • ToF obtains the lowest average execution time on all workflows. • Over Auto-scaling by 21% • Over Baseline by 21% • Over Greedy by 18%
Outline • Related Work • Generalized Optimization Frameworks • General transformations for cost and performance optimizations • A declarative language for workflow optimization problems • Conclusions
Deco: A Declarative Optimization Framework • Outline • Main contributions of this work • System overview • A declarative language for workflows • GPU-accelerated search engine • Evaluation results
Main Contributions • This work has three main contributions • A declarative language for resource provisioning of scientific workflows in IaaSclouds • A generalized optimization framework to serve a wide range of optimization problems • Fast GPU-based implementation for low optimization overhead
Motivating Ideas • Why declarative language? • Declarative languages like HTML, SQL, Prolog • Concise and clear • Focus on what to do rather than how to do it • Why GPU acceleration? • Generic search has large runtime overhead • Monte Carlo method is used for probabilistic approximation [Raedt et al. 2007] which is suitable for GPU acceleration
System Overview • Overview of the Deco system • WLog, a declarative language for workflows • GPU-Accelerated search engine
WLog – A Declarative Language for Workflows • WLogis designed based on Prolog • A WLog program describing a workflow scheduling problem goal minimize Ct in totalcost(Ct). cons deadline(95%, 10h). varconfigs(Tid, Vid) forall task(Tid) and Vm(Vid). r1import(amazonec2). r2import(montage). r3 path(X,Y,Y,C) :- edge(X,Y), exetime(X,Vid,T), C is T. r4 path(X,Y,Z,C) :- edge(X,Z), Zn==Y, path(Z,Y,Z2,C1), exetime(X,Vid,T), C is T+C1. r5maxtime(Path,T) :- setof([Z,C],path(root,tail,Z,C),Set), max(Set,[Path,T]). r6 cost(Tid,Vid,C) :- price(Vid,Up), exetime(Tid,Vid,T), C is ceil(T/60.0)*Up. r7totalcost(Ct) :- findall(C,cost(Tid,Vid,C),Bag), sum(Bag,Ct). deadline(P, D) A probabilistic deadline requirement that D is at the P-thpercentile of workflow execution time. • problem specific keywords: • goal Optimization goal defined by the user. • cons Problem constraint defined by the user. • varProblem variable to be optimized. import(cloud) Import the cloud-related facts from the cloud metadata. import(daxfile) Import the workflow-related facts generated from a DAX file.
GPU Accelerations • Explore vs. exploit • By exploit, partial results are prioritized. • Exploration traverses the search tree level by level which offers GPU a opportunity to parallel the searching process. • Memory optimizations • Minimize the usage of global memory • Reduce accesses to shared memory
Evaluation Settings • Three use cases • Workflow scheduling problem • Workflow ensemble [Malawski et al., SC’12] • Goal: execute more workflows with high priorities within given budget and deadline • Follow-the-cost: multiple workflows, multiple datacenters • Comparison for workflow ensemble problem • Algorithms: Deco vs. SPSS [Malawski et al., SC’12] • Ensemble types: constant, Uniform(Un)sorted, Pareto(Un)sorted • Generate 5 budgets between [MinBudget, MaxBudget] • All results are normalized to that of SPSS
Evaluation Results • Under all ensemble types and budget constraints • Deco obtains better score metric value than SPSS Obtained score results of SPSS and Deco with different ensemble types under budget 1 to 5 and fixed deadline. Workflow type is Ligo.
Evaluation Results (cont’d) • Programmability of WLogin Deco (lines of codes) • Users (re-)implement the workflow application in C++. • With Deco, users implement in WLog. Deco allows much lower coding complexity than manual implementation.
Performance Speedup of GPUs • Performance speedup of GPU implementation over CPU implementation on a single core for the three applications 437x 93x 31x
Outline • Related Work • Generalized Optimization Frameworks • General transformations for cost and performance optimizations • A declarative language for workflow optimization problems • Conclusions
Conclusions • IaaS clouds have become an attractive platform for hosting workflows. • Despite recent efforts in monetary cost optimizations of workflows in the cloud, there is still a large room for further improvements. • Due to the complex cloud offerings and problem specifications, we develop general optimization frameworks. • ToF achieves up to 29% improvement over the state-of-the-art algorithm. • Deco achieves up to 77% improvement over the state-of-the-art algorithm.
Future Work • Energy-efficient Cloud • Reduce the investment cost of cloud provider to potentially reduce instance price with energy-efficient hardware/software • Optimization opportunities in Multi-Cloud • Utilize different cloud offerings, e.g., instance types, to further reduce cost
References • MaciejMalawski, Gideon Juve, EwaDeelman, and JarekNabrzyski. 2012. Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds.SC '12. 11 pages. • Juve, G.; Deelman, E.; Vahi, K.; Mehta, G.; Berriman, B.; Berman, B.P.; Maechling, P., "Scientific workflow applications on Amazon EC2," E-Science Workshops, pp.59,66, 9-11 Dec. 2009. • Jens-SönkeVöckler, Gideon Juve, EwaDeelman, Mats Rynge, and Bruce Berriman. 2011. Experiences using cloud computing for a scientific workflow application. ScienceCloud '11. P15-P24. 2011. • Ming Mao, Marty Humphrey: Auto-scaling to minimize cost and meet application deadlines in cloud workflows. SC 2011: 49. • Herald Kllapi, Eva Sitaridi, Manolis M. Tsangaris, and Yannis Ioannidis. 2011. Schedule optimization for data processing flows on the cloud. SIGMOD '11. 289-300. • AnshulRai, RanjitaBhagwan, and SaikatGuha. 2012. Generalized resource allocation for the cloud. SoCC '12. Article 15 , 12 pages. • Changbin Liu, Lu Ren, Boon Thau Loo, Yun Mao, and PrithwishBasu. 2012. Cologne: a declarative distributed constraint optimization platform. Proc. VLDB Endow. 5, 8 752-763. • L. De Raedt, A. Kimmig, and H. Toivonen, ProbLog: A probabilistic Prolog and its application in link discovery, IJCAI 2007, pages 2462-2467, 2007. • Amelie Chi Zhou, Bingsheng He, Transformation-based Monetary Cost Optimizations for Workflows in the Cloud, accepted by TCC, Dec 2013. • Amelie Chi Zhou, Bingsheng He, A declarative optimization framework for workflows in IaaS clouds, submitted to SC 2014. • Amelie Chi Zhou, Bingsheng He, Cheng Liu, Monetary Cost Optimizations for Hosting Workflow-as-a-Service in IaaSClouds, submitted to ToC, 2014.
Thank you! Amelie Chi Zhou amelie.czhou@gmail.com Advisor: Bingsheng He bshe@ntu.edu.sg Xtra Computing Group http://pdcc.ntu.edu.sg/xtra Nanyang Technological University, Singapore