130 likes | 243 Views
Optimized Caching Policies for Storage Systems. Amir Rachum Chai Ronen Final presentation Industrial Supervisor: Dr. Roee Engelberg , LSI. Introduction – Storage Tiering. System data is stored over different types of storage devices
E N D
Optimized Caching Policies for Storage Systems Amir Rachum Chai Ronen Final presentation Industrial Supervisor: Dr. Roee Engelberg, LSI
Introduction – Storage Tiering System data is stored over different types of storage devices Generally speaking, in data storage, for a given price, the higher the speed, the lower the volume The idea is enable use of larger, low-cost disk space with the benefits of high-speed hardware-optimize data storage for fastest overall disk access This requires a dynamic algorithm for managing (migrating) the data across the tiers.
Goals Creating a platform which will allow us to test different algorithms in system-specific scenarios. Testing several algorithms and finding the optimal algorithm amongst them for storage tiering in different scenarios.
Methodology We coded a simulator that represents the platform running the tiered storage system. We created several data structures that represent the data on the system, its location at all times, record read/write operations, and several other unique features We used a recording of real I/O calls for such a system to simulate an actual scenario.
Accomplishments • Created an Algorithm interface that supports any algorithm, multiple tiers and multiple platform data structures. • Our design is generic enough to enable very easy addition of usage statistics and platform data. • CLI enabled quick input of input file, chunk size, tiers information. • Varying chunk size let us research the effect of the size on run time and algorithm effectiveness. • We implemented 2 caching algorithms: • A “naïve” algorithm that transfers every chunk to the top tier upon IO • A more efficient algorithm that minimizes migrations • Smart implementation resulted in low disk space usage for the various data structures (used a default tier).
Algorithm conclusions • We ran 3 different scenarios: • Small chunk size (16B), small SSD size (64B, *4 chunk size) • Large chunk size (2048B), (relatively) small SSD size( 8196B, *4 chunk size) • Small chunk size (16B), relatively large SSD size ( 8196B, *512 chunk size)
Algorithm conclusions • When using extremely small SSD size (*4 chunk size), both caching algorithms are ineffective: • The naïve one showed a high number of reads from higher tier, yet had twice as many migrations between tiers • The smart algorithm, despite having half the migrations of the naïve algorithm, showed very little reading from higher tier. • In this case, the dummy algorithm proved very efficient, as it saved all the time needed for relatively useless migrations.
Algorithm conclusions • When running with a large chunk size and *4 SSD size, the caching algorithms received much better results than the dummy algorithm. However, the 2 caching algorithms did not differ in between themselves.
Algorithm conclusions Running with a small chunk size and a large SSD size, the 2 caching algorithms also gave similar results. However, they were far inferior to the results from the previous run.
General Conclusions • Chunk size greatly affects the runtime of the platform, but “standard” size does not take long to run. • Smart usage of Boost greatly decreases work and is very effective. • Good implementation can result in huge disk space saving. • Despite having data structures in the platform, most non-naïve algorithms also need their own data structure of some sort • Working with Git source control proved to be very helpful: • Retrieving old code that was once thought to be obsolete . • Collaboration.