1 / 25

Measuring Quality of Service on Worker Node in Cluster

Measuring Quality of Service on Worker Node in Cluster. Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division, BARC, Mumbai, India Helge Mainhard, Tony Cass, Olof Barring, CERN Geneva, Switzerland. INTRODUCTION. Quality of Service

afia
Download Presentation

Measuring Quality of Service on Worker Node in Cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring Quality of Service on Worker Node in Cluster Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division, BARC, Mumbai, India Helge Mainhard, Tony Cass, Olof Barring, CERN Geneva, Switzerland CHEP 06

  2. INTRODUCTION • Quality of Service • Defines goodness of a node for a type of task • Needed for better/optimum utilization of resources • Computer Division, BARC and IT Division CERN collaborated to explore ways to predict QoS CHEP 06

  3. Texecution = Wall clock execution time for any task Tnoload = Wall clock execution time of the task on a given node without load QoS = Quality of Service QoS – Definition • QoS defines, how better the node is for a given task • QoS relates execution times like this • QoS varies between 0 to 1 CHEP 06

  4. Methodology • Three task categories • CPU intensive • Disk IO intensive • Network IO intensive • Representative probe programs for each category • Load generating program for each category CHEP 06

  5. Methodology • Monitor system metrics • Load avg, CPU utilization, Memory utilization, disk utilization, swap utilization etc. • Execute probe programs in different load conditions (generated using load generating programs) • Correlate probe execution time, system metrics and no load execution time of probe CHEP 06

  6. Probe Selection • Probe should • Represent real world applications • Have less execution time • Non-interactive • Selected probes are • Linpack for CPU intensive • Bonnie for Disk IO intensive • Network IO intensive (not considered) CHEP 06

  7. Load Generating programs • Generate load in given category • Should have large execution time • Feature for varying the load • Two type of Disk IO load • Block IO (IO in large data blocks) • Character IO (IO in small data blocks) CHEP 06

  8. SETUP • 32 node cluster • Each node consists of • P4@1.6 GHz • 640 MB memory • 40 GB HDD • Redhat Linux version 7.3 • EDG Fabric Monitoring System for gathering system metrics CHEP 06

  9. (Equation 1) CPU Probe • CPU probe in different loading conditions • Correlation using load average • Execution time varies linearly with load average • Problem in block IO load CHEP 06

  10. CPU Probe CHEP 06

  11. CPU Probe • Load average represents combined CPU and IO load • CPU probe depends only on CPU load • Two ways to achieve it • Average CPU load (VmStatR) • Calculate available CPU to probe CHEP 06

  12. (Equation 2) CPU Probe • Average CPU Load • 1 minute running average of run queue • Called VmStatR • Predicted QoS will be CHEP 06

  13. CPU Probe CHEP 06

  14. (Equation 3) CPU Probe • Available CPU to probe • Calculate using CPU utilization metric • Probe is eligible for • Available Idle time • A share of System and User time CHEP 06

  15. CPU Probe • Table shows the comparison between QoS predicted using equation 1 & 3 in Block IO load • QoS using Eq. 3 shows correct characteristic CHEP 06

  16. Comparison of results • Compare the QoS results obtained using the three equations for CPU probe in different loads • Equation 1 does not give correct prediction in block IO load conditions • Equation 2 & 3 give acceptable results in any load condition CHEP 06

  17. CPU Probe – Comparison of results LC – CPU Load LC+LB – CPU + Block IO Load LC + LCh – CPU + Character IO Load LCh + LB – Character + Block IO Load CHEP 06

  18. Disk IO Probe • Modified ‘Bonnie’ to perform both as block IO and character IO probe • Considered block IO probe as most of the applications were block IO intensive • Correlate execution time probe under different loading conditions • Predicted QoS using the three equations and compared results CHEP 06

  19. Disk IO Probe – Comparison of results LC – CPU Load LC+LB – CPU + Block IO Load LC + LCh – CPU + Character IO Load LCh + LB – Character + Block IO Load CHEP 06

  20. CMSIM Results • Predicted execution time using QoS from Equation 2 • % error against the measured one acceptable CHEP 06

  21. Problem Areas • Effect of swapping • If available memory is less than the size of task • Linux kernel dynamically changes the priorities of tasks and swaps tasks accordingly • Difficult to predict QoS CHEP 06

  22. Problem Areas – Swapping CHEP 06

  23. Problem Areas • Metric sampling frequency of monitoring system • Immediate metric value ensures better QoS prediction • At higher sampling frequency monitoring loads the node • Change in state after submission of task • QoS can’t consider load changes after submission of task • Submission/removal of other task may change QoS CHEP 06

  24. Conclusion • Equation 2 & 3 provides better QoS for CPU bound applications • Equation 1 can be used for IO bound applications • Successfully predicted for CMSIM – It is mostly cpu bound job • Load balancing programs can use derived equations for job submissions CHEP 06

  25. Thanks CHEP 06

More Related