Glen Pitt-Pladdy :: BlogLinux RAID (mdadm) and rebuild tuning | |||
Every now and then it's necessary to replace a disk in an array and rebuilding a large RAID5/6 array was looking like it was going to take over 10 hours running at around 50MB/s With even cheap desktop SATA drives these days you should be able to achieve comfortably over 100MB/s so improvement was clearly possible. Many of these techniques will also apply to tuning for Linear IO (eg. streaming a large video file) but in my case I've got a pattern of small random IO so more conservative values are suitable for normal running. The rebuild speed can be seen with: $ cat /proc/mdstat Speed LimitsThere are adjustable limits on rebuild speed which can be checked with: $ cat /sys/block/mdX/md/sync_speed_max That is a limit of 200MB/s which should be enough for an array of SATA drives - chances are few will be able to achieve that sort of speed anyway. Unnecessary IOLinux will vary the speed of rebuild to give priority to regular IO over the rebuild. This lowers the impact of the rebuild on normal running of the system, but it also means that if you can cut down on IO it will increase the rebuild speed. To identify possible causes of IO I ran $ iostat -Nx 10 After a while the volumes that are getting IO could be seen - as expected /var/ was the main area and a few unnecessary processes polling files / updating things could be stopped. This did mean that the speed remained a bit above 50MB/s for longer, but never reached over 60MB/s. Tuning Stripe CacheDuring a rebuild all drives have to be read in order to rebuild the array. For this the caching of stripes is a critical factor and if lots of small reads have to be made rather than large reads then performance will suffer. To avoid this a large Stripe Cache helps enormously. To understand if you are maxing the Stripe Cache look at the existing size: $ cat /sys/block/mdX/md/stripe_cache_size That's tiny so it's likely you will fully use it all during a rebuild as can be seen: $ cat /sys/block/mdX/md/stripe_cache_active So increase the size (double) and repeat until you see the the cache is no longer being maxed out and the speed increases. I ended up seeing no significant benefit above 8192 so stopped at: # echo 16384 >/sys/block/mdX/md/stripe_cache_size Roughly:
Increasing this further doesn't seem to make any difference in my case which is understandable when we're not fully using 16384 most the time. Tuning this is something you might like to do during normal use anyway since many usage patterns will benefit from higher than the default 256. NCQNative Command Queuing is something that people have reported causing problems. To test this I've tried disabling it on all the devices in the array: # echo 1 >/sys/block/sdX/device/queue_depth I'm not seeing any obvious difference due to this but there may be 1-2% which gets lost in the noise. Read AheadThis is simply anticipating that more data will be read and reading it ahead of time which would be very relevant to rebuilding an array. Unfortunately in my tests it seems this goes counter to what would be expected with the default 256 on each drive (not the md device) giving the fastest rebuild - raising this in my case seems to slow things down. Other StuffThe Linux RAID performance page has some other things around tuning which may be useful, but in my testing the above is what has worked for me. Another thought is that at this speed one core seems to be maxed so maybe that's the problem here - this is really the limit of the CPU? I also notice that one drive is being read at about 15% faster than the rest - I'm guessing this is resulting in some kind of bottleneck but I don't know why this would be. That said, the overall data transfer rate is about 840MB/s which is impressive, and will result in an approx 5hour rebuild across 7x 3TB drives. Not bad! |
|||
Disclaimer: This is a load of random thoughts, ideas and other nonsense and is not intended to be taken seriously. I have no idea what I am doing with most of this so if you are stupid and naive enough to believe any of it, it is your own fault and you can live with the consequences. More importantly this blog may contain substances such as humor which have not yet been approved for human (or machine) consumption and could seriously damage your health if taken seriously. If you still feel the need to litigate (or whatever other legal nonsense people have dreamed up now), then please address all complaints and other stupidity to yourself as you clearly "don't get it".
Copyright Glen Pitt-Pladdy 2008-2023
|
Comments: