Menu
Index

Contact
Atom Feed
Comments Atom Feed

Similar Articles

2011-02-08 18:37
IPv6 NAT .... or not as the case may be
2015-01-08 22:59
php-fpm on Cacti via SNMP
2010-06-22 10:19
Nokia 5800 ExpressMusic M3U Playlists - Unix tricks
2015-02-13 22:51
opendkim on Cacti via SNMP
2010-04-24 10:31
OpenWrt Take 2 - native IPv6 on DG834 v3 (using AAISP)

Recent Articles

2019-07-28 16:35
git http with Nginx via Flask wsgi application (git4nginx)
2018-05-15 16:48
Raspberry Pi Camera, IR Lights and more
2017-04-23 14:21
Raspberry Pi SD Card Test
2017-04-07 10:54
DNS Firewall (blackhole malicious, like Pi-hole) with bind9
2017-03-28 13:07
Kubernetes to learn Part 4

Glen Pitt-Pladdy :: Blog

Linux performance metric myths (Load Average, % IO Wait)

For many years I've regularly heard things like people shutting down a service and waiting until Load Average drops to avoid a server crashing and similar. Trouble is that metrics like Load Average are intended to give a "finger in the air" estimate of how busy a system is, rather than specific risks a system is under (eg. will it crash soon?)

It's true, sometimes a server under sustained high Load Average will crash, and sometimes it's completely normal for servers to be running at a Load Average in the region of 100. The important thing is understanding WHY this is the case to make sound judgements.

Load Average

Probably the most misused and misunderstood metric on Linux - everyone thinks they know what it means, hardly anyone actually does.

First up, different Unix systems calculate this figure in subtly (but important) different ways, which is probably where a lot of the confusion comes from.

Red Hat defines Load Average as a "number corresponding to the average number of runnable processes on the system". Note carefully that it's runnable not running processes. It also says nothing of why the processes are or are not running.

The specific thing to remember is that on Linux, Load Average is the average of running and blocked processes. That's really important because it means that you can have 0% CPU usage and still have high Load Average.

Experiment: High Load, Low CPU

And to demonstrate, let's cook that exact scenario up. I Have a dual CPU CentOS 7 VM on my Home Lab. For this we need an NFS Server and Client. Out the box I already have rpcbind and nfs-utils installed. First up we need an exported filesystem - edit /etc/exports and add a line something like:

/tmp    localhost(rw,sync)

Start up services as needed:

# systemctl start rpcbind
# systemctl start nfs-server
# systemctl start nfs-utils

Then mount this filesystem somewhere:

# mount -t nfs4 localhost:/tmp /somewhere/to/mount/

Now we can cause IO to block to this mountpoint simply be shutting down the NFS server:

# systemctl stop nfs-server

Then we need to create some IO. In multiple terminals execute a command like:

# touch /place/you/mounted/nfs/testN

Where N is a different number.

In another terminal run top and you will quickly see Load Average rise while CPU is minimal, in fact just largely just the CPU needed to run top will be used:

NFS High Load Experiment with Low CPU Usage

So here we have CPUs, almost completely idle, with a load of about 5 (due to 5 processes blocked with the NFS server shut down).

In this case plenty of resources are available for doing loads more work (so long as it isn't work involving access to the NFS mount) even though many would consider this Load Average to mean that the server is 2.5x overloaded.

So why do servers with high Load Average crash?

Well.... actually, they don't always, just sometimes. Typically the problem I've seen fuelling this belief is with things like preforking servers such as Apache on defaults, often when running a memory hungry PHP application using mod_php or a similar application scenario.

What happens is that each process is consuming a significant amount of memory (say about 1GB) and as the number of concurrent connections grows eventually physical memory is exhausted and swapping starts to occur. Because processes are then causing IO (swapping) in order to execute, and the slow disk IO causes processes to wait (block) while memory is swapped, the Load Average figure increases sharply.

The dramatically slower performance with the swapping also means that more of these memory hungry processes get forked to handle incoming requests which are queuing up and the problem is compounded. Eventually oomkiller springs to life and starts trying to clean up, sometimes even killing some important processes, or memory exhaustion reaches the point that the server is no longer responsive (perceived as crashed).

This is however NOT a Load problem. High Load Average is simply a symptom of swapping due to memory exhaustion - not a cause.

Why does shutting down the service until the Load Average subsides then help? This again is not a direct cause and effect. Shutting down the service frees up the memory (at the expense of the site going down hard). Other processes that had been swapped out then get swapped back into physical memory as they need. While a significant amount swapping-in is occurring the Load Average will still remain high as the processes are still being blocked, however this is often only a matter of seconds. The misunderstanding is that it is necessary to wait until the Load Average goes right back down as this is just waiting for the time it takes for the maths to average the numbers lower and not indicative of any remaining load - the moment the initial swapping has subsided there's no reason to wait.

The real solution here is actually:

  1. Set sensible limits so that resource exhaustion remains manageable (ie. limit the processes and queue requests to avoid swapping occurring). This simply makes the situation more manageable when capacity is insufficient to immediately process requests.
  2. Get enough resources (or make the application more efficient) so that you have the capacity to handle the requests coming in
Comparing code

I've also seen benchmarking done based on Load Average.

An example is benchmarking Nginx and Apache. Nginx server has much lower Load Average, but while Nginx is indeed an extremely performant server, just comparing Load Average is not valid. The problem is simply that different code consumes resources differently and Load Average does not take that into account. In this example, ordinary prefork Apache uses blocking IO which gets counted in Load Average, where Nginx is very carefully designed with non-blocking IO which means waiting for IO is not going to get taken into account with Load Average the as it is with Apache.

The valid way of comparing is to compare the actual resource usage. In the case of Apache vs Nginx, Memory and CPU usage would be good indicators, and probably also request latency (time to first byte) would be useful, but Load Average definitely can't make a like-for-like comparison.

% IO Wait

This is another one of those tricky metrics which can indicate a problem, but is also widely misunderstood and misused. The catch here is that this measure is of CPU Idle time while while waiting for IO. If the CPU is 100% busy with other workload then even if a process is being held up by IO, it's not going to show.

Experiment: Waiting for IO with low % IO Wait

Using the same CentOS 7 VM on my Home Lab, I'm going to copy (read) data from a disk - this uses minimal CPU, but will be largely limited by IO so should have high % IO Wait. To measure this I'm going to use iostat which will show both the CPU stats and IO stats:

# iostat -x sda 5

For simplicity I'm using a USB Flash drive to avoid caching (can be disconnected/reconnected easily) with a straight dd of data:

# dd if=/dev/sda of=/dev/null

And we see minimal CPU usage, but high (~50%) IO Wait, with the disk 100% utilised with high await figures:

Low CPU shows High % IO Wait during High IO

Now, let's do the same (after rebooting to clear the caches), but burn CPU at the same time - in fact burning both CPUs. To do this we have two terminals running a bit of Perl looping flat out to burn CPU, with nice so that the dd takes priority:

#!/usr/bin/perl
while ( 1 ) {}

Now with CPUs maxed, we get similar IO, but very different IO Wait figures:

High CPU shows Low % IO Wait during High IO

So here we have it: in both cases 100% utilisation of the storage, with very poor latency (about 1 second) on our test drive, but IO Wait is in fact show as absolutely zero when the CPU is loaded. In both cases the dd process is completely restricted by the throughput of our (slow) USB drive.

In this case our first example suggests an alarming IO bottleneck, the second suggests absolutely no IO bottleneck, yet we know that exactly the same IO bottleneck exists in both cases.

Comparisons

While this is an extreme example for illustrative purposes, the fundamental thing is that comparing % IO Wait needs to be done with extreme caution. It's another figure that can give very different results depending on system design and usage patterns so can't be used for direct comparison.

In cases where there is also significant CPU utilisation while IO is occurring the % IO Wait will not be reliable metric. It's much more sensible to look directly at actual IO figures with tools such as iostat or similar tools. Most of these directly correspond to IO performance however are not without their problems as some (eg. svctm and %util) are probably not valid where storage is capable of handling concurrent IO (as opposed to an older single drive where all IO would be serial).

Comments:

Saj Image  2016-05-15 11:49 :: Saj

Very Good explanation...really helpful




Note: Identity details will be stored in a cookie. Posts may not appear immediately