Menu
Index

Contact
Atom Feed
Comments Atom Feed

Similar Articles

2011-05-01 11:25
Log checker (mini_logcheck) for OpenWrt
2011-05-01 20:00
syslog-ng on OpenWrt
2011-05-01 15:40
Filesystem checker (mini_fscheck) for OpenWrt
2010-11-17 18:40
OpenWrt on RB450G for AAISP FTTC
2010-04-24 10:31
OpenWrt Take 2 - native IPv6 on DG834 v3 (using AAISP)

Recent Articles

2019-07-28 16:35
git http with Nginx via Flask wsgi application (git4nginx)
2018-05-15 16:48
Raspberry Pi Camera, IR Lights and more
2017-04-23 14:21
Raspberry Pi SD Card Test
2017-04-07 10:54
DNS Firewall (blackhole malicious, like Pi-hole) with bind9
2017-03-28 13:07
Kubernetes to learn Part 4

Glen Pitt-Pladdy :: Blog

Hunting latency on OpenWrt

With the combination of a OpenWrt router/firewall and my current Cacti ping (latency) graphs it was becoming obvious that the latency of the OpenWrt device had some unusually large peak latency for a usually idle router.

Logging

As with most thing, I tend to start by seeing if there are any clues in the logs. While there was nothing obvious, at the exact times that there was high latency I found bursts of log activity. Logging is done to a RAM filesystem so while that is busy, everything has to wait. I have fairly comprehensive logging on the device and didn't want to turn that down so the other option is to offload it to another server.

I am running rsyslog on the server so it needed a little extra config. Ideally I also want to keep the OpenWrt device logs in their own file. To keep things neat I created a new config for the router on the server that is to keep the logs - say /etc/rsyslog.d/local-openwrt.conf with the following config:

# log via udp
$ModLoad imudp
$UDPServerRun 514
# direct the router logs to it's own log file
if $hostname startswith 'OpenWrt' then /var/log/openwrt-messages
# ignore these lines in further logs
& ~

This essentially enables log input via UDP on port 514 (standard syslog port) and puts all the logs from the hostname "OpenWrt" in the file /var/log/openwrt-messages, and then excludes those lines from further logging else they would also end up in the other logs on the server. This config is almost exactly as discussed on the rsyslog examples. To make rsyslog listen on the port requires a full restart. Also remember to open any further packet filtering (eg. on the server) to allow connection from the OpenWrt device.

Next we need to configure the OpenWrt device to log remotely. I am running syslogd-ng on it so that requires a few changes to /etc/syslog-ng.conf to enable remote logging. I added a new destination:

destination remote {
        udp("address.of.logging.server");
};

And then change the destination line in the log section to use the remote log:

destination(remote);

Restart syslog-ng on the OpenWrt device and all things being correct you should see logs going into the logfile on the server instead of the device.

The impact on latency is quite obvious. I made the change shortly after 18:30 and the change in the latency graph clearly reflects this:

OpenWrt latency after switching to remote logging

The improvement is obvious and especially considering that from about 19:00 the router was shifting a lot of traffic (backups) for a couple hours without significant impact, but there is still more that can be gained, so time to dig further into the latency....

Log buffering

It turns out that the logs are still showing bursts of logging (eg. port scanning activity) exactly matching the times of the poor latency. One thing I have experimented with is greatly increasing the log FIFO size, but this has made no significant difference.

No logging

The remaining question is how much impact logging is having on the latency. As an experiment I tried disabling logging all together.

There is definitely a slight change in latency, from about 18:00 when the change was made with the high peaks gone, but certainly not the medium (about 4ms) peaks. Those are clearly caused by something else.

It does seem that even with a large (64k) FIFO and remote logging, firewall logging is bad when it comes to latency. The best approach I can think of to solve this is to be more selective about logging and removing a lot of of regular stuff like duplicate ACKs (symptom of packet loss) without logging them. Possibly also FIN packets (connection closes) which are also prone to duplicates and worse those often come in bursts when connections to a site are closed when browsing onto another site.

The thing that remains is to find the cause of the remaining mid-height (around 4ms) spikes.

Digging deeper

Examining the ping logs manually, the latency is not periodic (ie. it isn't some cron job or Cacti polling). There is nothing in the logs that seems to match.

Without logging, it's difficult to match anything firewall related, and there is no obvious peaks in traffic - in fact many of the peaks are happening at quiet times.

Stopping dnsmasq and ntpclient has made no noticable difference. There is very little other than kernel running on the router: watchdog, httpd, dropbear (ssh), hotplug2, cron, and syslog-ng.

None of those things are likely to be doing much at the time of the latency - all of them are going to be almost always idle or operating periodically which would show up.

What's left?

I suspect what remains is just normal kernel related latency. As this is a single CPU (and a small one at that) any atomic operation has to complete before the CPU is available to do other stuff and threads don't actually run concurrently with the single CPU - they all have to wait their turn. That probably also means that many routine things like servicing hardware interrupts (eg. timers), memory management, syncing filesystems, and more could cause some waits for the network stack.

I've come across other similar platforms from big router brands delivering far higher average latency with their OS' so really it's probably a bit unfair given that it's averaging only about double what big quad-core desktop system manages in the way of latency.

Comments:




Note: Identity details will be stored in a cookie. Posts may not appear immediately