240 likes | 403 Views
Performance vs Cost for Windows and Linux Platforms in Windows Azure Cloud. Sasko Ristov and Marjan Gusev “Ss. Cyril and Methodius” University, Skopje, Macedonia. Introduction. Cloud data centers Cost effectiveness Comparable performance with other computing
E N D
Performance vs Cost for Windows and Linux Platforms in Windows Azure Cloud Sasko Ristov and Marjan Gusev “Ss. Cyril and Methodius” University, Skopje, Macedonia November 2013, CloudNet 2013, San Francisco, USA
Introduction • Cloud data centers • Cost effectiveness • Comparableperformance with other computing • Loosely coupled science applications will be increasingly implemented on clouds [Gunarathne et al. 2010]. • Several HPC applications in the cloud have even higher performance and cost efficiency compared to the traditional cluster [Roloff et al. 2012] • We defined regionswhere cloud has better performance for cache intensive algorithms [Gusev and Ristov 2012] November 2013, CloudNet 2013, San Francisco, USA
Introduction • Current cloud pricing models: • Double price for double resources • The same price for the same amount of resources • This is changed !!! • But, the performance depends on: • Runtime Environment (C#, C++) • Operating System (Windows, Linux, …) • Virtualization Technique (KVM, XEN, …) November 2013, CloudNet 2013, San Francisco, USA
Introduction • Open issues: • Performance, Cost, Performance-cost tradeoff • How to use the cloud for HPC applications • Paper goal: • The performance of parallelization in Windows Azure • Analyzed parameters: • Windows Azure, as one of the most common cloud • Scaling the resources • Windows / Linux VMs • Cost, Performance November 2013, CloudNet 2013, San Francisco, USA
Outline • Testing Methodology • Performance Comparison • Discussion • Conclusion and Future Work November 2013, CloudNet 2013, San Francisco, USA
Testing Algorithm • Dense Matrix Multiplication Algorithm • CN * N = AN * N * BN * N • Double precision elements, 8 bytes each • For parallel experiments • Each thread multiplies block AN/c * Nand the whole BN * N • c = 2, 4, 8denotes the total number of parallel threads November 2013, CloudNet 2013, San Francisco, USA
Infrastructure and Platforms • The same physical hardware infrastructure • Windows Azure Extra Large VM with 2 processors (chips) AMD Opteron 4171 HE and total 8 CPU cores. • 64 KB L1 data cache private per core • 512KB L2 cache private per core • All cores share 5 MB L3 cache • Two different platforms • VM with Windows Server 2008 • C# with threads for parallel implementation • VM with Ubuntu12.04 • C++ and OpenMPas API for parallel implementation. November 2013, CloudNet 2013, San Francisco, USA
The Experiments • Four experiments are realized in each platform • sequential execution using only one core, • parallel execution with • two, • four and • eight threads • Series of test cases in each experiment by varying the matrix size N є [128,1000] • Smaller matrices are omitted since there are negligible effects on the performance [Krpic 2012] • Greater matrices are omitted since other algorithms should be used, which reduce the cache misses • blocking matrices [Hennessy and Patterson 2012] or • improved hybrid 2D/1D blocking for AMD CPUs [Gusev et al. 2012] November 2013, CloudNet 2013, San Francisco, USA
Measured Parameters • Speed • Speedup • Relative Speedup November 2013, CloudNet 2013, San Francisco, USA
Cache Regions • Four experiments are realized in each platform • L1 • requirements fit in L1 cache expecting the highest performance • L2and L3 • the cache requirements fit in L2 and L3 caches • L4 • main memory region where the cache requirements are beyond the L3 cache size and L3 cache misses are generated November 2013, CloudNet 2013, San Francisco, USA
Outline • Testing Methodology • Performance Comparison • Discussion • Conclusion and Future Work November 2013, CloudNet 2013, San Francisco, USA
The Speed • Windows is better than Linux for each c and N • Much better in L2 and L3regions, small number of L3 cache misses are generated • Less in L4region, where a lot of cache misses are generated November 2013, CloudNet 2013, San Francisco, USA
The Speedup • Linux provides better speedup in L2 and L3regions • Windows provides better speedup (superlinear) in the L4region for each c November 2013, CloudNet 2013, San Francisco, USA
The Relative Speedup November 2013, CloudNet 2013, San Francisco, USA
Outline • Testing Methodology • Performance Comparison • Discussion • Conclusion and Future Work November 2013, CloudNet 2013, San Francisco, USA
Performance Behavior - Maximums • Maximums of relative speedups in cache regions • Windows produces the best maximum performance in L3cache region regardless of c. • The smallest maximum performance is observed in L4 • The lowest speeds November 2013, CloudNet 2013, San Francisco, USA
Performance Behavior - Minimums • Minimums of relative speedups in cache regions • Windows produces smaller relative performance in L4for each c. • The minimum is almost always greater than 1. November 2013, CloudNet 2013, San Francisco, USA
Performance Behavior - Speedup • Superlinear speedup • The matrices cannot be stored in one cache (sequential), but can be stored in 2 / 4 / 8 caches (parallel) • Only for Windows • software emulation of the Linux to different cores in different chips instead using a single chip, or • in software emulation, which does not translate the address space and patterns in Linux to enable the superlinear effect as in Windows. November 2013, CloudNet 2013, San Francisco, USA
Cost Performance Tradeoff • Windows Azure has changed the pricing model starting from April 16th, 2013 • Kept the linear scaling of price, • the price ratio PR(c) = PriceW/ PriceL has been changed from 1 to 1.5 for each VM (c є {1, 2, 4, 8}) • This motivated us to introduce the parameter CPT(c) = R(c) / PR(c) • which platform has better Cost Performance Tradeoff November 2013, CloudNet 2013, San Francisco, USA
Cost Performance Tradeoff • If both the price and performance are considered • Windows is better for N < 512 • Linux is better for N > 512 • Only for performance • Windows is better • c=4 • c=8 November 2013, CloudNet 2013, San Francisco, USA
Outline • Testing Methodology • Performance Comparison • Discussion • Conclusion and Future Work November 2013, CloudNet 2013, San Francisco, USA
Conclusion • The regions are identified where a particular platform performs better, and/or requires less cost. • Cost • Linux is better (33% lower price) • Performance • Windows is better (Up to 3 times) • Performance + Cost • Windows is better for smaller matrices (regions L2and L3), • Linux is better for greater matrices (L4region) • Use other algorithms • FINAL CONCLUSION • Use Windows and C# in Windows Azure for cache intensive problems. November 2013, CloudNet 2013, San Francisco, USA
Future Work • Cost and performance on other • Commercial clouds (Amazon, Google) • Open Source clouds (OpenStack, Eucalyptus, …) • Hypervisors (XEN, KVM) • Hardware (Intel Xeon) with different • cache size, • cache line, • cache associativity, • cache replacement policy, • cache levels, • cache inclusivity etc. November 2013, CloudNet 2013, San Francisco, USA
QUESTIONS? November 2013, CloudNet 2013, San Francisco, USA