250 likes | 356 Views
ANALYZING STORAGE SYSTEM WORKLOADS. Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, psk}@cs.uct.ac.za, DNA Research Group Computer Science Department University of Cape Town, and Lourens O. Walters. Lourens.Walters@s1.com Mosaic Software Rondebosch Cape Town Republic of South Africa. 2.
E N D
ANALYZING STORAGE SYSTEM WORKLOADS Paul G. Sikalinda, Pieter S. Kritzinger{psikalin, psk}@cs.uct.ac.za, DNA Research GroupComputer Science DepartmentUniversity of Cape Town, and Lourens O. Walters.Lourens.Walters@s1.comMosaic SoftwareRondeboschCape Town Republic of South Africa.
2 Presentation Outline Introduction Motivation and Objectives Storage Systems Storage System Workloads The Storage System Workload Analyzed Statistical Methodology Workload Analysis Results Conclusions Future Work
3 Introduction The DNA Group specializes, among other things, in using theory, formal methods and software tools in the: – specification of … – design of … – modelling of … – building of … – security of … – *workload analysis of … – correctness analysis of … – performance analysis of … concurrent computing systems (CCS).
4 Introduction(cont’d) ANALYZING STORAGE SYSTEM WORKLOADS
5 Introduction (cont’d) PROCESSOR RQ ANALYZING STORAGE SYSTEM WORKLOADS RP Start Address Operation Type Request Size Timestamps Etc. 5
6 Motivation and Objectives A lot of effort is being spent in improving the I/O subsystem because it is a bottleneck in current computer systems. -In design, performance and correctness evaluation of storage systems the workload modelling is an important component. Common assumption not correct: -Uniform distribution of start addresses, -Exponential inter-arrival times. Therefore storage system workload analysis should be done to come up with correct models.
7 Motivation and Objectives(cont’d) -Designing storage systems. -Designing I/O optimization techniques (read caching, write caching, pre-fetching, I/O parallelism, I/O rescheduling) to improve performance. -Understanding application behavior and requirements. -Deciding to pool storage system resources (SSPs). -Implementing intelligent storage systems. etc.
8 Motivation and Objectives(cont’d) Our aim was to analyze storage system workloads in terms of inter-arrival times, sizes and “seek distances” of I/O requests andprovide statistics for these parameters to be used to: (a) derive models for storage system evaluation and (b) design optimization techniques (read caching, I/O parallelism etc. )
Path to host Host/Bus adapter Path to cache Cache Path to controller Array controller Path to disks Disk drives 9 Storage Systems Enterprise Storage System (ESS)
10 Storage Systems(cont’d) ESS are powerful disk storage systems with the following capabilities: -High performance*, -Large capacity and availability -Protection against physical drive failure can be provided using RAID methods. *But can not still match the processor speeds because of mechanical processes in the disk drives.
Application Software I/O request Operating System File System I/O request Disk System 11 Storage System Workloads I/O Request Servicing and workload classification: -Logical Workloads (File System Workloads) -Storage System Workloads (Physical I/O Traffic)
12 Storage System Workloads (cont’d) Workload Parameters: -Logical Volume Number *Start Address (seek distances) *Request Size Operation Type (i.e., read or write) *Time Stamp (inter-arrival times)
13 The Storage System Workload Analyzed We analyzed inter-arrival times, request sizes, and ”seek distances” of I/O requests from a system running a web search enginedeviation. Got the I/O trace files from Storage Performance Council (SPC). (http://www.storageperformance.org)
14 Statistical Methodology Visual Techniques: -Histogram and -ECDF graphs. Key Data Statistics -Sample mean, -Variance and standard deviation, -Coefficient of skew, kurtosis, and variation, -Five number data summaries (minimum, lower quartile, median, upper quartile, maximum). -Lower and upper outlier limits
15 Results 1: inter-arrival times (µm)
16 Results 1: inter-arrival times -Highly variable data. Range (126, 100100 microseconds) -Coefficient of kurtosis shows that the distribution is heavy tailed.
17 Results 2: Request sizes (bytes)
18 Results 2: Request sizes Distribution peaks – 8192 (60%), 16384(10%), 24576 (9%) and 32768 (20%). Reason: OS Filesystem Block - 8192 bytes
19 Results 3: Seek distances (blocks)
20 Results 3: Seek distances -The distribution of seek distances is symmetrical.
21 Conclusions (1) Analyzing storage system workloads is necessary to properly model the workloads: To model Web inter-arrival time, Weibull, lognormal, beta, gamma, exponential probability density functions should be considered. To model Web data size and seek distance using probability mass function is more appropriate. *We intend to use the models in simulations of ESS.
22 Conclusions (cont’d) (2) The analysis results are useful when designing optimization techniques of storage system. E.g., -Cache management block size – 8192 bytes. -I/O rescheduling and background tasking would be ideal for the workload. -The storage system handling the workload we analyzed can be optimized to handle the symmetrical behavior*. *The results are not broadly applicable.
23 Conclusions (cont’d) (3) Other conclusions: -Request sizes influenced by filesystem in use. -Seek distances are not always uniform distributed. *In summary, we have provided statistics about the parameters for the storage system workload that we analyzed and have shown how we can use them to derive models and design I/O optimization techniques.
24 Future Work Rigorously find a probability density function matching a given data set of inter-arrival times. - Analyze the storage system workloads in terms of other parameters (e.g., logical volume numbers and operation types)
25 THANK YOU FOR YOUR ATTENTION! ?