400 likes | 454 Views
Cloudward Bound: Planning For Beneficial Migration of Enterprise Applications to the Cloud. Bertha Wilhelm and David McGough. Outline . Context Motivation Overview of the Formal Model Evaluation Future Improvements Related Work Contributions . Context (d) .
E N D
Cloudward Bound: Planning For Beneficial Migration of Enterprise Applications to the Cloud Bertha Wilhelm and David McGough
Outline • Context • Motivation • Overview of the Formal Model • Evaluation • Future Improvements • Related Work • Contributions
Context (d) Businesses are looking toward the cloud! Why? Tempting cost reduction possibilities: • Lower capital and operational expense (someone else owns and maintains the cloud servers) • Higher efficiency (only need to pay for the resources you need, no wasted capacity) Good in theory, but are they actually? • Symantec survey of 1,780 datacenters • 82% list cost reduction as one of the top priorities • 72% considering moving to the cloud, of which 94% were the discussion, trial, or implementation phase
Context (d) Migration to the cloud has a lot of considerations: • Businesses have strict requirements on service parameters such as latency, uptime/availability, etc (remember Monday?) • Security is an obvious concern • Legal issues can also arise, as there are tight requirements on the handling of medical and credit information
Context (d) Migration to the cloud has a lot of considerations: • Businesses have strict requirements on service parameters such as latency, uptime/availability, etc (remember Monday?) • Security is an obvious concern (83% list security as the top concern) • Legal issues can also arise, as there are tight requirements on the handling of medical and credit information
Motivation (or, Why don't we let datacenters figure it out?) (b) Enterprise services are complicated! • Multiple applications, each can be broken down in the three-tier model: • front-end (web facing) • business logic (application guts) • back-end (data stores) • However, reality is much more complex: • multiple functional components at each tier + replication and load balancing on each FC • Potentially hundreds of distinct FCs per application:
Motivation (b) (Sample of 5 applications used by fortune 500 companies)
Motivation (b) Security can be a nightmare: Servers partitioned into logical VLANs, each firewalled to allow specific (required) interactions between dependent programs. This is disrupted by moving some parts to the cloud.
Motivation (b) Security - Firewall ACLs need to be reconfigured; non-trivial Here's an example extended Cisco ACL: access-list access-list-number [dynamic dynamic-name [timeout minutes]]{deny | permit} protocol source source-wildcarddestination destination-wildcard [precedence precedence] [tos tos] [log | log-input] [time-range time-range-name] (source: http://www.cisco.com/en/US/tech/tk722/ tk809/technologies_configuration_example 09186a008058ed26.shtml)
Motivation (b) Solution: Use a hybrid implementation that hosts some components locally and others on the cloud. This allows policy constraints (mentioned before) to be satisfied. Two problems emerge: 1) Which servers to migrate 2) Ensuring correctness of security (reconfiguring those ACLs to reflect the new solution)
Motivation (b) So how do we get the most we can out of the benefits of cloud migration in the face of these limitations? This is a CONSTRAINT OPTIMIZATION problem. The focus of this paper is to formalize the problem and then to provide an optimal solution.
Overview of Formal Model (d) Enables application architectures to systematically plan which components to migrate to the cloud • Definition of the Problem • Flow Balance Equations • Internet Communication Cost • Transaction Delays • Benefits of Migration
Defining the Problem (d) K applications, Ai, where i ranges from 1 to k. m components, Ci, where i ranges from 1 to m. An = {Ci, Cj, Ck} means application An uses components Ci, Cj, Ck. Construct a graph (G = Verts, Edges) such that V = {Ci} from i = 1 to m UNION {I, O}, where I and O represent internal and external users.
Defining the Problem (d) Nodes i,j are connected if i and j communicate in the network. Ti,j and Si,j denote the transactions per second and average size of a transaction between i and j, and they are the ijth entry in the transaction and size matrices, respectively. Each component Ci has Ni servers. Atomicity: databases are modeled as Cd such that Nd = 1.
Defining the Problem (d) The heart of the problem: For each component Ci, find a subset of its N servers ni < Ni to migrate to the cloud. Let P be the set of policy constraints (such as certain components must be local). Finally, the full problem:
Defining the Problem (d) Determine a migration strategy M = ni for all i such that: Given (MP), max Benefits(M) - Costs(M) subject to Policy Constraints (P) Delay Increase Constraints (M) Flow Balance Equations
Flow Balance Equations (d) Including flow balance equations ensures all requests are handled and not lost. The new graph after migration is constructed by duplicating the component nodes (one for local, one for cloud) and connecting split nodes (as well as reproducing connections between replicated nodes).
Flow Balance Equations (d) There are two basic approaches one can take: 1. Flexible routing: Component server CiL and CiR are allowed to direct different amounts of traffic. Permits location based routing. 2. Independent routing: CiL and CiR distribute traffic in the same proportion to their successful nodes. Modeled only because legacy applications may require it -- this is a more restrictive constraint than above and leads to a potentially less optimal migration solution.
Internet Cost (d) Modeled linearly as per unit cost of traffic at local data center * (new local traffic - old local traffic) + per unit cost of traffic at cloud * (traffic at cloud) Linear assumption fits with Amazon, Azure services
Delay (d) I.e., expected delay is the sum of the expected delays of all nodes i involved times the expected number of encounters plus the sum of the expected delay of each i,j edge traversed times the expected number of traversals. Same for the new migrated network, denoted by E[D'], etc.
Delay (d) I.e., the change in mean delay is the difference between the new and old expected delays: E[D'] - E[D]
Modeling Benefits (d) BcMc + BsMs Where: Bc = benefit of migrating a compute-intensive server Bs = benefit of migrating a storage-intensive server Mc = compute-intensive servers migrated M Ms = storage-intensive servers migrated in M Much room for expansion into heterogeneous and non-linear models
Putting it all together: solving the problem (d) CPLEX for linear optimization problems (integer programming) BARON for non-linear optimization problem (e.g. for considering variance and percentile delay, or independent routing) As we develop better tools for solving these types of problems, this model will achieve even better optimums.
Maintaining reachability: modifying ACLs (d) Paper presents an algorithm for migrating ACLs that has two important properties: 1. Correctness is maintained 2. Unwanted traffic is filtered before traversing the Internet
Evaluation (b) • Windows Azure SDK & Enterprise Resource Planning Application deployed in a large network • Validated the model's effectiveness in meeting constraints on changes in application response time
Cloud Setup: Deriving Model Parameters: Measured transaction sizes, component services times, and various communication delays Evaluation- Planned Migration of Thumbnail Application
Evaluation - Planned Migration of Thumbnail Application Modeling Migration benefits and Communication costs: (Leveraged Amazon's EC2 cloud pricing, and Berkley's analysis that migrating servers to the cloud can reduce costs by a factor of 7 for compute-class servers, and 5 for storage-class servers) • $1577 per year for migrating a compute-class server • $17280 per year for migrating a storage-class server
Evaluation- Planned Migration of Thumbnail Application Migration strategies recommended by the model: • More BL and BE servers are migrated than FE servers • The number of BL and BE servers migrated often the same • Variance plays an important role in recommendations (i.e. D= 110%)
Evaluation- Planned Migration of Thumbnail Application Validating recommendations through cloud deployment: Deployed the recommended strategy for 80% internal users, with constraints of up to 10% increase in mean delay and 50% in variance. As expected, the response time after migration increased but the increase is still within acceptable limits.
Evaluation-Planned Migration of campus ERP application Model of Enterprise Resource Planning (ERP) application used in a large university Modeling a deployed ERP app:
Evaluation-Planned Migration of campus ERP application Inferring model parameters: • Conducted end-to-end measurements of typical user requests • Inferred communication delays on other links • Node service times = end-to-end response times - link communication delays • Estimated communication delays with the cloud • Measured upload and download times of similar files size to the Azure cloud • used in estimating communication delays between local and migrated components, and delays related to external users • 10 transactions per sec • Values for migration benefits communications costs from prior case study
Evaluation-Planned Migration of campus ERP application Recommendation From Model: Results produced by the model with flexible routing approach Illustrates at least 3 scenarios where the hybrid approach could be useful: 1. When there are policy restrictions on migration (first row) 2. Migrating the entire app is feasible only when the delay bound is at 130% or more (row 5) 3. Full Migration of components can have a substantial impact on delay, thus (rows 2-4) demonstrate partial migration of these components Interaction between the components plays a critical role in planning decision
Evaluation-Planned Migration of campus ERP application Sensitivity to model parameters: Migration strategy is impacted by varying model parameters Key Insights: • For delay bounds, the optimal migration strategy dominates all other feasible solutions in that it moves more CPU and storage servers than any other approach • Optimal migration does not depend on the benefit estimates • Benefit estimates impact the strategy if there are multiple feasible approaches to realize the delay bound • Relative size of transactions between different components may determine the optimal strategy independent of benefit estimates • For most delay bound settings, recommended strategy does not change unless it is higher then a factor of 10
Testing ACL algorithm on campus ERP (d) Correctness: The algorithm maintained reachability and filtered unwanted traffic: the two goals mentioned earlier. Performance and Scalability: Ran algorithm on network of 700 VLANs, 212 ACLs, and 7889 total rules: • Took about 4 minutes on a modest (dual core, 8GB RAM) computer. • Generated a new set of rules only 63 larger (and this was due to an inconsistency in the original network's ACL configurations).
Related Work (b) • Extending enterprises' network into the cloud using VPN • security framework can be leveraged to ensure security policies are extended to services on the VPN • Developing queuing models of applications to estimate mean response time • optimization framework that identifies app components to migrate- maximizes benefit, considers variance, mean response time • algorithms for placing security policies for deployment of new networks • unique issues in migrating existing applications to the cloud- model ensures better scaling with large networks
Future Work (d) Model Enhancements • Incorporating queing models to account for changes in queuing structures changes after migration • Understanding impact of migration on app reliability, given high-costs of downtime • Allow any number of servers to be installed in the local and cloud data-centers • Allow for multiple cloud locations • Extend cost and latency models to consider middle-boxes deployed in enterprises
Future Work (d) Handling dynamic variations in workload • Hybrid architecture's potential benefit to help handle peaks in workload (invoke cloud as needed) Executing migrations • Extending technologies such as live migration to minimize service disruption Obtaining model parameters • application discovery • application dependencies • component response times • traffic exchanged between components • model parameter inaccuracies dealt with by running the model with multiple sets of inputs
What were the contributions? (d) • SOLID and EXTENSIBLE formalization of the migration problem that is usable and real-world scenarios • Demonstration of the potential benefit of hybrid migration • Demonstration of the feasibility of planned migration • Tool for data centers admins to explore migration to the cloud and justify it in a business framework (cost reduction) • Starting point and groundwork for MANY future research opportunities (listed previously) Any ONE of these alone is a significant contribution that would make this paper worthwhile. Having all of them is icing.