270 likes | 448 Views
CIT 470: Advanced Network and System Administration. Performance Monitoring. Topics. Performance testing Performance tuning. CPU Memory Disk Network. What is performance testing?.
E N D
CIT 470: Advanced Network and System Administration Performance Monitoring CIT 470: Advanced Network and System Administration
Topics • Performance testing • Performance tuning. • CPU • Memory • Disk • Network CIT 470: Advanced Network and System Administration
What is performance testing? Performance testing is a type of testing intended to determine the responsiveness, throughput, reliability, and/or scalability of a system under a given workload. - http://perftestingguide.codeplex.com/ Performance testing goals: • Assess production readiness • Evaluate against performance criteria • Compare performance characteristics of multiple systems or system configurations • Find the source of performance problems • Support system tuning • Find throughput levels
Performance Testing Activities http://perftestingguide.codeplex.com/
Testing Types Performance testing: determining performance, scalability, or stability characteristics of system; a superset of the other testing types. Load testing: determining performance characteristics of system when subjected to work load expected during production. Stress testing: determining performance characteristics of system when subjected to work loads beyond those expected during production to determine under what conditions system will fail.
Baselines A baseline is a set of data used for comparison. In performance testing, baselines are used to evaluate the effectiveness of subsequent performance-improving changes to the system. Once the system has been changed, a new baseline must be measured.
Benchmarking Benchmarking is the process of measuring system performance using standard tests and comparing it against a well known system. SPEC CPU2006 (SPECint, SPECfp) SPEC power2008 (power usage) SPEC sfs2008 (NFS, CIFS) SPEC virt2010 (virtualization) SPEC web2005 (PHP or JSP) BogoMips Dhrystone Whetstone Weighted TeraFLOPS NAS Parallel Benchmarks
Identify Bottlenecks Identify which aspect of performance Latency: delay until initial access. Throughput: rate of transfer/processing. Identify which system component CPU Memory Disk Network CIT 470: Advanced Network and System Administration
Performance Tuning Process • Learn the customer’s problem. Identify specifically what’s wrong. • Find the problem’s cause and fix it. • When does the problem occur? • Has anything about the system changed? • What critical resource is affecting performance? • Have the right tools. Historical monitoring data will show what’s normal and identify any trends. CIT 470: Advanced Network and System Administration
Experimenter Effect Monitoring the system affects performance. Monitoring tools use system resources. If you’ve consistently monitored system, then monitoring won’t alter system performance. CIT 470: Advanced Network and System Administration
Performance Problem Solutions • Get more of needed resource. Ex: Upgrade processor, use striped disk array. • Reduce system requirements. Ex: Kill processes, move services to other hosts. • Eliminate inefficiency and waste. Ex: Produce a static home page every 15 minutes instead of regenerating each access. • Ration resource usage. Ex: Set process priorities with renice. Ex: Limit process resource usage with limit. CIT 470: Advanced Network and System Administration
Monitoring Processes uptime Provides aggregate data about system load. ps Shows running processes with CPU, mem usage. top Updated list of running processes + summaries. vmstat Summary data about processes and CPU usage. CIT 470: Advanced Network and System Administration
Uptime Uptime provides the following data How long system has been running. Number of users logged in. Average number of runnable processes. In last 1, 5, 15 minutes. Want a load average under 3. Uptime example > uptime 17:40 up 126 days, 8:03, 6 users, load average: 1.40, 1.03, 0.55 CIT 470: Advanced Network and System Administration
Monitoring CPU with vmstat Important Columns: • Number of Runnable and Blocked processes. • CPU usage by user, system, idle, and waiting. High CPU usage indicators • r > 0 in multiple consecutive rows • us+sy > 80 in many consecutive rows • id < 20 in many consecutive rows • > vmstat 5 4 • procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- • r b swpd free buff cache si so bi bo in cs us sy id wa • 0 0 395716 45176 211284 88480 0 0 1 2 1 2 9 3 88 0 • 2 0 395716 45168 211300 88480 0 0 0 50 1035 1677 71 2 27 0 • 30 395716 45168 211300 88480 0 0 0 0 1040 1670 75 3 23 0 • 5 0 395716 45168 211300 88480 0 0 0 0 1033 1660 83 3 13 0 CIT 470: Advanced Network and System Administration
Identifying CPU Shortages • Short-term CPU spikes are normal. • Consistently high number of runnable processes (r) in vmstat. • Consistent high total CPU usage (sy+us). • High system time compared to user time and high context switches indicates system is thrashing between processes instead of doing user work. CIT 470: Advanced Network and System Administration
Changing Process Priorities Nice values Positive values lower priorities. Negative values increase priorities. If you know a process will be a CPU hog, nice +5 command_name If you detect a CPU hog after it’s started, renice 5 PID CIT 470: Advanced Network and System Administration
Managing Processes with kill TERM (default) Terminates process execution (Ctrl-c). Processes can catch or ignore signal. KILL (9) Terminates process execution. Processes cannot catch or ignore. Processes waiting on I/O will not die. STOP Suspends process execution until SIGCONT (Ctrl-z). Useful for moving CPU hog out of way temporarily. CIT 470: Advanced Network and System Administration
Monitoring Memory with vmstat Important columns • si (swap in) = loading pages from disk, including program starts • so (swap out) = saving pages to disk because out of RAM Memory problems are indicated by: • so > 0 in multiple consecutive rows • si > 1000 in many consecutive rows • > vmstat 5 4 • procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- • r b swpd free buff cache si so bi bo in cs us sy id wa • 0 0 395716 45176 211284 88480 0 0 1 2 1 2 9 3 88 0 • 0 0 395716 45168 211300 88480 0 5 0 50 1035 1677 0 0 100 0 • 0 0 395716 45168 211300 88480 0 8 0 0 1040 1670 0 0 99 0 • 0 0 395716 45168 211300 88480 0 6 0 0 1033 1660 0 0 100 0 CIT 470: Advanced Network and System Administration
Managing Memory • Improving paging capacity. Add new swapfiles with swapon. Add new swap partitions. • Improving paging performance. Use swap partitions instead of swap files. Distribute swap resources across disks. • Migrate memory hogs to another host. • Add more memory. CIT 470: Advanced Network and System Administration
Identifying I/O bottlenecks Important Columns: • Number of Runnable and Blocked processes. • Blocked processes cannot run because waiting on I/O. • Blocks/second transferred in (bi) and out (bo) Identifying problems: • b > 0 consistently across multiple rows • bi and/or bo > 1000 across many rows • > vmstat 5 4 • procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- • r b swpd free buff cache si so bi bo in cs us sy id wa • 0 0 395716 45176 211284 88480 0 0 1 2 1 2 9 3 88 0 • 0 5 395716 45168 211300 88480 0 0 0 50 1035 1677 0 0 100 0 • 0 7 395716 45168 211300 88480 0 0 0 0 1040 1670 0 0 99 0 • 0 6 395716 45168 211300 88480 0 0 0 0 1033 1660 0 0 100 0 CIT 470: Advanced Network and System Administration
Monitoring Disk I/O Use iostat to get per disk statistics. Transactions per second (tps). Blocks read/written per second. Managing disk performance problems. Distribute heavily used data across disks/ctrlers. Get more or faster disks. Use RAID or LVM striping. CIT 470: Advanced Network and System Administration
iostat > iostat 2 Linux 2.6.15-23-386 (zim) 03/26/2007 avg-cpu: %user %nice %system %iowait %steal %idle 8.55 0.18 3.22 0.09 0.00 87.96 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hde 0.69 8.18 9.43 89783416 103565744 hdh 0.15 1.33 3.37 14590831 36969599 hdc 0.00 0.00 0.00 9548 0 avg-cpu: %user %nice %system %iowait %steal %idle 0.17 0.00 0.17 0.00 0.00 99.67 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hde 0.33 0.00 21.33 0 128 hdh 0.00 0.00 0.00 0 0 hdc 0.00 0.00 0.00 0 0 CIT 470: Advanced Network and System Administration
Managing Disk Capacity Detecting disk resource usage. List all partition usage with df –h Identify high usage directories with du Summary data: du –s Highest usage directories: du -k /|sort –rn Use find to detect disk hogs. Use find –size to search for big files. Use –atime +X to identify files that haven’t been used in X days. CIT 470: Advanced Network and System Administration
Managing Disk Shortages • Add more disks. • Move files to remote fileservers. • Eliminate unnecessary files. • Compress large infrequently used files. • Impose disk quotas on users. Soft limit: can be violated temporarily. Hard limit: cannot be violated. CIT 470: Advanced Network and System Administration
IPTraf CIT 470: Advanced Network and System Administration
iftop CIT 470: Advanced Network and System Administration
References • Mark Burgess, Principles of System and Network Administration, Wiley, 2000. • Aeleen Frisch, Essential System Administration, 3rd edition, O’Reilly, 2002. • Mike Loukides and Gian-Paolo D. Musumeci, System Performance Tuning, 2nd edition, O’Reilly, 2003. • Evi Nemeth et al, UNIX System Administration Handbook, 3rd edition, Prentice Hall, 2001. • patterns & practices, Performance Testing Guidance for Web Applications, http://perftestingguide.codeplex.com/