1 / 16

A Cloud Data Center Optimization Approach using Dynamic Data Interchanges

A Cloud Data Center Optimization Approach using Dynamic Data Interchanges. Prof. Stephan Robert http:// www.stephan-robert.ch University of Applied Sciences of Western Switzerland IEEE CloudNet San Francisco November 2013. Motivation and background.

yan
Download Presentation

A Cloud Data Center Optimization Approach using Dynamic Data Interchanges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Cloud Data Center Optimization Approach using Dynamic Data Interchanges Prof. Stephan Robert http://www.stephan-robert.ch University of Applied Sciences of Western Switzerland IEEE CloudNet San Francisco November 2013

  2. Motivation and background • Distributed datacenters in the Cloud have become popular ways to increase data availability and reducing costs • Cloud storage has received a lot of attention with a view to reduce costs: • Minimizing infrastructure and running costs • Allocation of data servers to customers • Geo-optimization (look at locations of where customers are to decide where to place datacenters)

  3. Datacenter optimization • Research areas on optimizing datacenter operations: • Energy and power management • Cost benefit analysis • Cloud networks versus Grids • Geo-distribution of cloud centers • Multi-level caching

  4. Motivation and background (cont.) • We consider the operational situation when we have decided on the datacenter locations. • Is there any other optimization we can perform? • Problem we examine: • Data locality: users not always near the data -> higher costs • Situation can change over time: we can decide to place our data near the users now, but there is no guarantee this will not change in the future

  5. Principal idea • We consider a model for actively moving data closer to the current users. • When needed, we move data from one server to a temporary (cache) area in a different server. • In the near future, when users request this particular data, we can serve them from the local cache.

  6. Benefits • Benefit of copying (caching) data to a local server: • We correct the mismatch between where the data is and where the users are. • We only copy once (cost), read many (benefit). • We ‘train’ the algorithm by using a history of requests to determine the relative frequency of items being requested (in an efficient way, as the number can be very large).

  7. Model • We consider a combinatorial optimization model to determine the best placement of the data • This model will tell us if we need to copy data from one datacenter to another, in anticipation of user requests. • The optimization aim is to minimize the total expected cost of serving the future user data requests • The optimization constraints are the cache size capacities. • The model accounts for: • The cost of copying data between datacenters • The relative cost/benefit of delivering the data from a remote vs. a local server • The likelihood that particular data will be requested in particular locations in the near future

  8. Model Probability object i will be requested by user u Cost of copying object i from default datacenter to another datacenter d Expected cost of retrieving object i from datacenter d The cache size Z of each datacenter must not be exceeded if object i is obtained from datacenter d Each object must be available in at least one datacenter

  9. Operational aspects • Firstly, we must obtain a historical log of requests, including who requested what, where the file was located and file size. • We use this information to calculate the access probabilities in the model (in practice, using Hbase/Hadoop in a distributed manner). • The costs in the model have to be decided based on the architecture etc (eg the relative benefit of using a local server versus a remote one for a particular user. • Periodically (eg daily) we run the algorithm to determine any data duplication that is beneficial to do. • (Of course, the network must be aware of the local copies and know to use them).

  10. Computational experimentation • Computational experimentation carried out in a simulation environment (no real-life implementation at this stage) • We measured the costs/benefits of obtaining the data directly against using our optimization model to ‘rearrange’ the data periodically • Consistent performance for 3, 5, 10 datacenters.

  11. Computational experimentation • Setup of N datacenters located on a circle • Users placed at random inside the circle • Costs linked to the distance • Data object requests were generated from Zipf distribution (independently for each user) • First half if data used to train the algorithm (historic access log), the second half used for the simulation.

  12. Simulation results – parameter variation

  13. Promising results with ~ 20% cost reduction on average Full results appear in the proceedings paper Simulation results

  14. Practicalities – is the idea feasible in a real system? • More complexities but also easy solutions • Time criticality: no need to use on live system, can optimize object locations overnight “periodic dynamic reconfiguration” • Metadata storage: need to store object access frequencies to calculate the probabilities p. Implemented a metadata storage in HBase on a Hadoop cluster. –> conclusion “feasible and easy”

  15. Complexity issues • Optimization problem is complex (NP hard) to solve. • Can keep input size small: We only need to consider the most popular objects.. • Currently developing a fast heuristic algorithm based on knapsack methods • Standard problems of data • Other complexities: legal issues of moving data across countries (if personal data are involved)

  16. Thank you Questions?

More Related