Glen Pitt-Pladdy :: BlogSMART stats on Cacti (via SNMP) | ||||
Following on from the basics of SNMP I did previously, this article now adds the first set of SNMP extension scripts, config, and Cacti templates to monitor hard drives. Being SMARTSelf Monitoring, Analysis, and Reporting Technology is contained in most hard drives these days. It provides a number of built in tests to evaluate the health of a drive and hopefully predict many failures. Linux has a suite of tools called "smartmontools" which provides a comprehensive set of utilities and a monitoring daemon for checking drives. Configuration of regular testing and monitoring (smartd) is beyond this article and there are plenty of docs around for that already, but what is often useful is to graph key parameters to spot anomalies with parameters which would otherwise go unnoticed. After installing smartmontools, you can check the basic parameters that drives have with the command: smartctl -a DEVICE
Where DEVICE is the device for the drive (not a partition). Typically this would be something like /dev/sda (first drive), /dev/sdb (second drive) etc. or /dev/hda (first drive), /dev/hdb (second drive), or some combination of both. If a drive does not have SMART enabled it will say that in the output of the above. To enable SMART on the drive: smartctl -s on DEVICE
Note that USB drives do not currently allow SMART data, even though the physical drives inside the boxes are SMART capable. I have no idea why this is the case, and USB drives are the ones I would really like to monitor as they get bashed about more and have poor cooling compared to fixed drives in a system. Getting SMART over SNMPLike discussed previously, SMART data requires root privilege to access, and snmpd runs as a low privilege user. What I do is have a CRON job that reads this data and stores it in files for snmpd to access via extension scripts. If you are using the same config I described previously, then simply add the lines to your /etc/snmp/local-snmp-cronjob file to make it look something like this (may have other content for other tasks): #!/bin/sh
# update smart parameters for devfull in /dev/sd?; do dev=`/bin/echo $devfull | /bin/sed 's/^.*\(sd.\)$/\1/'` /usr/sbin/smartctl -n idle -a $devfull >/var/local/snmp/smart-$dev.TMP /bin/mv /var/local/snmp/smart-$dev.TMP /var/local/snmp/smart-$dev done This code simply runs through devices matching /dev/sd? (ie. /dev/sda, /dev/sdb etc.) and dumps their SMART data to a file in /var/local/snmp as described previously. From here extension scripts for snmpd can pick it up without requiring privilege. SMART parameters are numbered and it made sense to me to exploit the numbering in a universal script instead of having to treat each parameter on it's own. Download: Perl script to extract SMART parameters for SNMP I place this script (make it executable first: chmod +x smart-generic) in /etc/snmp This script takes one argument of the SMART parameter number and outputs the difference (remaining life) between the current value and the threshold for that parameter. It is worth noting that different manufacturers (and even different models and revisions of drives) create these values differently so the value is of little interest on it's own, but unusual fluctuations or downward trends are worth taking note of. For temperatures it is normally necessary to take the raw data which can be done by prefixing the parameter ID with a 'R'. In /etc/snmp/snmpd.conf add the following lines (or others if you want to monitor them): extend smartreaderr /etc/snmp/smart-generic 1
extend smartrealloc /etc/snmp/smart-generic 5 extend smartseekerr /etc/snmp/smart-generic 7 extend smartairflow /etc/snmp/smart-generic R190 extend smarttemp /etc/snmp/smart-generic R194 extend smarteccrec /etc/snmp/smart-generic 195 These are respectively:
There are many other parameters which you could also monitor and as can be seen, they are easily added by simply referencing the parameter ID. Note that the config presented here only looks at /dev/sd? devices. If your system has /dev/hd? devices then you will need to modify the scripts accordingly. Once you have added all this in you can test smart-generic by running it from the command line with appropriate parameters, and via SNMP by appending the appropriate SNMP OID to the "snmpwalk" commands shown previously. Cacti TemplatesI have generated some basic Cacti Templates for these SMART parameters with 2 drives. They are easily extended for more drives (just duplicate the Data Source Template, change it as appropriate and increment the number on the end of the OID, and add another trace into the Graph Template to reference that Data Source). Download: Cacti Templates for SMART over SNMP Simply import this template, and add the graphs you want to the appropriate device graphs in Cacti. It should just work if your SNMP is working correctly for that device (ensure other SNMP parameters are working for that device). Graph Screen Shots
Update:One person had pointed out that there has been some problems on some versions of Cacti with importing the templates. The template was generated with version 0.8.7b (from Debain Lenny). The problem manifests it's self as "Cacti version does not exist" error, and appears to be cured by adding in this version, although in my version the file is actually global_arrays.php The relevant array from my global_arrays.php / config_array.php: $hash_version_codes = array( |
||||
|
Disclaimer: This is a load of random thoughts, ideas and other nonsense and is not intended to be taken seriously. I have no idea what I am doing with most of this so if you are stupid and naive enough to believe any of it, it is your own fault and you can live with the consequences. More importantly this blog may contain substances such as humor which have not yet been approved for human (or machine) consumption and could seriously damage your health if taken seriously. If you still feel the need to litigate (or whatever other legal nonsense people have dreamed up now), then please address all complaints and other stupidity to yourself as you clearly "don't get it".
Copyright Glen Pitt-Pladdy
|
||||
Comments:
10/03/2010 15:55 :: Scott
Hi Glen,
great work you've done. I had never got my head around populating the SNMP MIB with an external process until now. It always looks easy when it's in front of you!
I modified your Perl script and created an SNMP Query XML which will handle any numbers of drives. It uses only two graph templates, one for errors and the other for temperatures. Basically you get 2 graphs per drive. I have a server with 10 drives and did not want to do all the work to add 8 more data sources! And I thought it would be better to make it handle any number of drives. If you're interested I can send you my XML templates.
Regards
Scott
20/03/2010 09:28 :: Glen Pitt-Pladdy
Hi Scott
Sorry for the delay getting back to you about your post on my blog. It did take a while to pull all the fragments of info together to get SNMP working neatly.
Your approach sounds interesting and I'm sure would help lots of people who have servers with more than 2 disks to monitor.
I am happy for you to post a URL on my blog linking to your code & templates, else I am happy to host them off my server.
Either way, please make sure you give yourself credit for the extra work you have put into the template and scripts - I would suggest a comment in the code about the enhancements with your name and a URL.
Thanks
Glen
09/06/2010 06:34 :: Pablolibo
Could Scot send me your templates, thank a lot
Pitt-Pladdy, thank for it template :D