Glen Pitt-Pladdy :: BlogRapid drive failure (as seen by Cacti) | |||
I've got extensive monitoring of systems and with the Cacti templates I have that includes SMART and iostat. Last night I had an unusually rapid deterioration in drive health. It's part of a RAID5 volume, but surprisingly I had been expecting another drive to fail first having watched it gradually deteriorating over the past year. Previously I've seen parameters deteriorated over months. The thing that was unexpected with this was how rapidly the drive deteriorated - in just over an hour it went from 100% health to warning of impending failure. The SMART point of viewThis graph using my Cacti SMART templates says it all:
While it appears to have deteriorated rapidly, it hasn't been failed from the array - I expect that to happen as soon as it runs out of space to reallocate sectors to. The iostat point of viewWhat is curious though is comparing the iostat graphs to others in the array. Theoretically IO should be evenly spread across all drives in the array, but the unhealthy drive is distinctly different on many areas. Unhealthy drive %Utilisation:
Healthy drive %Utilisation:
During the period of deterioration this distinctly increased, but this is really just an indication of how much of the drive's bandwidth is being used - if the drive is busy reallocating sectors then there's no surprise that it's going to have less bandwidth for normal IO. Unhealthy drive await:
Healthy drive await:
The same thing happening here - IO is taking distinctly longer during the deterioration when sectors are being reallocated. Unhealthy drive svctm:
Healthy drive svctm:
This is very obvious - when we eliminate the amount of IO from the equation (yes, I know this can't really be trusted for that when concurrent IO is happening) we see the unhealthy drive stand apart most clearly.
|
|||
Disclaimer: This is a load of random thoughts, ideas and other nonsense and is not intended to be taken seriously. I have no idea what I am doing with most of this so if you are stupid and naive enough to believe any of it, it is your own fault and you can live with the consequences. More importantly this blog may contain substances such as humor which have not yet been approved for human (or machine) consumption and could seriously damage your health if taken seriously. If you still feel the need to litigate (or whatever other legal nonsense people have dreamed up now), then please address all complaints and other stupidity to yourself as you clearly "don't get it".
Copyright Glen Pitt-Pladdy 2008-2023
|
Comments: