1 CIT 668: System Architecture Performance Testing
2 Topics 1. What is performance testing? 2. Performance-testing activities 3. UNIX monitoring tools
3 What is performance testing? Performance testing is a type of testing intended to determine the responsiveness, throughput, reliability, and/or scalability of a system under a given workload. - Performance testing goals: Assess production readiness Evaluate against performance criteria Compare performance characteristics of multiple systems or system configurations Find the source of performance problems Support system tuning Find throughput levels
4 Performance Testing Activities
5 Testing Types Performance testing: determining performance, scalability, or stability characteristics of system; a superset of the other testing types. Load testing: determining performance characteristics of system when subjected to work load expected during production. Stress testing: determining performance characteristics of system when subjected to work loads beyond those expected during production to determine under what conditions system will fail.
6 Baselines A baseline is a set of data used for comparison. In performance testing, baselines are used to evaluate the effectiveness of subsequent performance-improving changes to the system. Once the system has been changed, a new baseline must be measured.
7 Benchmarking Benchmarking is the process of measuring system performance using standard tests and comparing it against a well known system. SPEC CPU2006 (SPECint, SPECfp) SPEC power2008 (power usage) SPEC sfs2008 (NFS, CIFS) SPEC virt2010 (virtualization) SPEC web2005 (PHP or JSP) BogoMips Dhrystone Whetstone Weighted TeraFLOPS NAS Parallel Benchmarks
8 Experimenter Effect Monitoring the system affects performance. Monitoring tools use system resources. If you ve consistently monitored system, then monitoring won t alter system performance.
9 Identify Bottlenecks Identify which aspect of performance Latency: delay until initial access. Throughput: rate of transfer/processing. Identify which system component CPU Memory Disk Network
10 Performance Problem Solutions 1. Get more of needed resource. Ex: Upgrade processor, use striped disk array. 2. Reduce system requirements. Ex: Kill processes, move services to other hosts. 3. Eliminate inefficiency and waste. Ex: Produce a static home page every 15 minutes instead of regenerating each access. 4. Ration resource usage. Ex: Set process priorities with renice. Ex: Limit process resource usage with limit.
11 Performance Testing Services Gomez Keynote Pingdom SiteUptime Alertra
12 Performance Testing Activities
13 Activities Activity Input Output Identify test environment Production system architecture Test system architecture Available tools Comparison of test and production environments Environment concerns Are other tools needed? ID acceptance criteria Client expectations Success criteria Plan and design tests Risks to be mitigated Available system features and components Use cases Configure test environment Tools Success criteria Tests Performance goals and requirements Test data to implement tests Use models to be simulated Resources required Configured load generation and resource monitoring tools Environment ready for tests
14 Activities Activity Input Output Implement test design Configured tools Validated, executable tests Prepared environment Available tools Validated resource monitoring Execute tests Test execution plan Test results Analyze Results, Report, and Retest Configured tools Executable tests Test results Acceptance criteria Risks, concerns, and issues Validated data collection Results analysis Recommendations Reports
15 Web Load Tools ab (Apache Bench) httperf autobench (httperf multihost wrapper) JMeter openload SIEGE
16 Metric Collection and Notification Tools Ganglia Cacti Nagios Zabbix Hyperic HQ Munin ZenOSS OpenNMS GroundWork Monit
17 UNIX Monitoring Tools
18 uptime Monitoring Processes Provides aggregate data about system load. ps Shows running processes with CPU, mem usage. top Updated list of running processes + summaries. vmstat Summary data about processes and CPU usage.
19 Uptime Uptime provides the following data How long system has been running. Number of users logged in. Average number of runnable processes. In last 1, 5, 15 minutes. Want a load average under 3. Uptime example > uptime 17:40 up 126 days, 8:03, 6 users, load average: 1.40, 1.03, 0.55
20 vmstat Number of Runnable and Blocked processes. Memory (virtual, free, buffered, cached) Blocks/second transferred in (bi) and out (bo) Interrupts/sec (in) and context switches/sec (cs) CPU usage by user, system, idle, and waiting. > vmstat 5 4 procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa
21 Identifying CPU Shortages 1. Short-term CPU spikes are normal. 2. Consistently high number of runnable processes (r) in vmstat. 3. Consistent high total CPU usage (sy+us). 4. High system time compared to user time and high context switches indicates system is thrashing between processes instead of doing user work.
22 Nice values Changing Process Priorities Positive values lower priorities. Negative values increase priorities. If you know a process will be a CPU hog, nice +5 command_name If you detect a CPU hog after it s started, renice 5 PID
23 Managing Processes with kill TERM (default) Terminates process execution (Ctrl-c). Processes can catch or ignore signal. KILL (9) STOP Terminates process execution. Processes cannot catch or ignore. Processes waiting on I/O will not die. Suspends process execution until SIGCONT (Ctrl-z). Useful for moving CPU hog out of way temporarily.
24 Imposing Limits on Processes CPU time Maximum file size Maximum data segment Maximum stack size Maximum physical mem Maximum core size Maximum number procs Maximum virtual mem ulimit t secs ulimit f KB ulimit d KB ulimit s KB ulimit m KB ulimit c KB ulimit u n ulimit v KB
25 Monitoring Memory Use free to see how memory is used. System will use most free memory for caching. System will swap out inactive processes. Don t worry until free < 5% of total memory. Use vmstat to detect paging activity. Page out (so) rate greater than 0 consistently. High page in (si) rate, as system uses the paging facility to load programs into memory.
26 Managing Memory 1. Improving paging capacity. Add new swapfiles with swapon. Add new swap partitions. 2. Improving paging performance. Use swap partitions instead of swap files. Distribute swap resources across disks. 3. Migrate memory hogs to another host. 4. Add more memory.
27 Monitoring Disk I/O Use iostat to get per disk statistics. Transactions per second (tps). Blocks read/written per second. Managing disk performance problems. Distribute heavily used data across disks/ctrlers. Get more or faster disks. Use RAID or LVM striping.
28 iostat > iostat 2 Linux (zim) 03/26/2007 avg-cpu: %user %nice %system %iowait %steal %idle Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hde hdh hdc avg-cpu: %user %nice %system %iowait %steal %idle Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hde hdh hdc
29 Managing Disk Capacity Detecting disk resource usage. List all partition usage with df h Identify high usage directories with du Summary data: du s Highest usage directories: du -k / sort rn Use find to detect disk hogs. Use find size to search for big files. Use atime +X to identify files that haven t been used in X days.
30 Managing Disk Shortages 1. Add more disks. 2. Move files to remote fileservers. 3. Eliminate unnecessary files. 4. Compress large infrequently used files. 5. Impose disk quotas on users. Soft limit: can be violated temporarily. Hard limit: cannot be violated.
31 Monitoring Network Connections List listening network ports lsof -i List firewall rules (which ports are accessible) iptables -L List network connections and listening ports netstat -anp
32 IPTraf CIT 470: Advanced Network and System Administration Slide #32
33 Managing Network Capacity 1. Move applications onto separate servers. 2. Add more NICs and bond them. 3. Upgrade from 1Gbps to 10Gbps Ethernet if supported by server hardware.
34 Key Points Performance testing terms Load testing and stress testing Latency and throughput Baselines and benchmarks Performance testing activities 1. Identify test environment 2. Identify performance criteria 3. Plan and design tests 4. Configure test environment 5. Implement test design 6. Execute tests 7. Analyze results, report, and retest
35 References 1. Mark Burgess, Principles of System and Network Administration, Wiley, Aeleen Frisch, Essential System Administration, 3 rd edition, O Reilly, Mike Loukides and Gian-Paolo D. Musumeci, System Performance Tuning, 2 nd edition, O Reilly, Evi Nemeth et al, UNIX System Administration Handbook, 3 rd edition, Prentice Hall, patterns & practices, Performance Testing Guidance for Web Applications,