Glen Pitt-Pladdy :: Blog

SMART stats on Cacti (via SNMP)

Update: This was originally one of my first articles on Cacti stats via SNMP, and subsequently I have built an ever growing collection of templates and extension scripts based on the same approaches. Originally this was done as 2-disk templates which where fine for the machines I was working with - my server here is a basic 2-disk setup, so why would I need more. Since that I've worked with all sorts of different disk arrangements and had to fudge things to make useful templates. This update fixes that by bringing things down to two basic templates and switching to indexed SNMP allowing an arbitrary number of disks to be used.

This follows on from the basics of SNMP I did previously, this article adds a set of SNMP extension scripts, config, and Cacti templates to monitor hard drives.

Being SMART

Self Monitoring, Analysis, and Reporting Technology is contained in most hard drives these days. It provides a number of built in tests to evaluate the health of a drive and hopefully predict many failures.

Linux has a suite of tools called "smartmontools" which provides a comprehensive set of utilities and a monitoring daemon for checking drives. Configuration of regular testing and monitoring (smartd) is beyond this article and there are plenty of docs around for that already, but what is often useful is to graph key parameters to spot anomalies with parameters which would otherwise go unnoticed.

After installing smartmontools, you can check the basic parameters that drives have with the command:

smartctl -a DEVICE

Where DEVICE is the device for the drive (not a partition). Typically this would be something like /dev/sda (first drive), /dev/sdb (second drive) etc. or /dev/hda (first drive), /dev/hdb (second drive), or some combination of both.

If a drive does not have SMART enabled it will say that in the output of the above. To enable SMART on the drive:

smartctl -s on DEVICE

Note that USB drives do not currently allow SMART data, even though the physical drives inside the boxes are SMART capable. I have no idea why this is the case, and USB drives are the ones I would really like to monitor as they get bashed about more and have poor cooling compared to fixed drives in a system.

Getting SMART over SNMP

Like discussed previously, SMART data requires root privilege to access, and snmpd runs as a low privilege user. What I do is have a CRON job that reads this data and stores it in files for snmpd to access via extension scripts.

If you are using the same config I described previously, then simply add the lines to your /etc/snmp/local-snmp-cronjob file to make it look something like this (may have other content for other tasks):

#!/bin/sh

# update smart parameters
for devfull in /dev/sd?; do
    dev=`/bin/echo $devfull | /bin/sed 's/^.*(sd.)$/1/'`
    /usr/sbin/smartctl -n idle -a $devfull >/var/local/snmp/smart-$dev.TMP
    /bin/mv /var/local/snmp/smart-$dev.TMP /var/local/snmp/smart-$dev
done

This code simply runs through devices matching /dev/sd? (ie. /dev/sda, /dev/sdb etc.) and dumps their SMART data to a file in /var/local/snmp as described previously.  From here extension scripts for snmpd can pick it up without requiring privilege.

SMART parameters are numbered and it made sense to me to exploit the numbering in a universal script instead of having to treat each parameter on it's own.

Download:  Perl script to extract SMART parameters for SNMP

I place this script (make it executable first: chmod +x smart-generic) in /etc/snmp

This script takes one argument of the SMART parameter number and outputs the difference (remaining life) between the current value and the threshold for that parameter. It is worth noting that different manufacturers (and even different models and revisions of drives) create these values differently so the value is of little interest on it's own, but unusual fluctuations or downward trends are worth taking note of. For temperatures it is normally necessary to take the raw data which can be done by prefixing the parameter ID with a 'R'.

In /etc/snmp/snmpd.conf add the following lines (or others if you want to monitor them):

extend smartdevices /etc/snmp/smart-generic devices
extend smartreaderr /etc/snmp/smart-generic 1
extend smartrealloc /etc/snmp/smart-generic 5
extend smartseekerr /etc/snmp/smart-generic 7
extend smartseekerr /etc/snmp/smart-generic 9
extend smartairflow /etc/snmp/smart-generic 189
extend smartairflow /etc/snmp/smart-generic R190
extend smarttemp /etc/snmp/smart-generic R194
extend smarteccrec /etc/snmp/smart-generic 195

These are respectively:

  • The device list used as the index
  • Raw_Read_Error_Rate
  • Reallocated_Sector_Ct
  • Seek_Error_Rate
  • Power_On_Hours
  • High_Fly_Writes
  • Airflow_Temperature_Cel (RAW)
  • Temperature_Celsius (RAW)
  • Hardware_ECC_Recovered

There are many other parameters which you could also monitor and as can be seen, they are easily added by simply referencing the parameter ID and updating templates to match.

Note that the config presented here only looks at /dev/sd? devices. If your system has /dev/hd? devices then you will need to modify the scripts accordingly.

Once you have added all this in you can test smart-generic by running it from the command line with appropriate parameters, and via SNMP by appending the appropriate SNMP OID to the "snmpwalk" commands shown previously.

Cacti Templates

I have generated some basic Cacti Templates for these SMART parameters with one graph for temperatures and another for health parameters. They are easily extended for more parameters.

For indexed SNMP, Cacti requires an XML file describing how to map the SNMP data to each drive. As this is a local (unpackaged) version I have done my configuration around putting this file in /usr/local/share/cacti/resource/snmp_queries/ and you will need to alter the templates if you put the file elsewhere.

Download: Disk SMART Cacti SNMP Query (XML)

Put this in /usr/local/share/cacti/resource/snmp_queries/ or wherever appropriate for your system. Note that if you change the location then you will also need to update the path to this file in the Cacti Data Query for this template.

Download: Cacti Templates for SMART over SNMP

Simply import this template, and add the data query to the hosts you want to monitor then you should see disks available to monitor and be able to add graphs you want in Cacti. It should just work if your SNMP is working correctly for that device (ensure other SNMP parameters are working for that device).

Graph Screen Shots

SMART over SNMP on Cacti : Temperatures

SMART over SNMP on Cacti : disk Health 

If you have more disks then you can add a pair of these graphs for every disk.

 

 

Comments:

Scott Image  10/03/2010 15:55 :: Scott

Hi Glen,

great work you've done. I had never got my head around populating the SNMP MIB with an external process until now. It always looks easy when it's in front of you!

I modified your Perl script and created an SNMP Query XML which will handle any numbers of drives. It uses only two graph templates, one for errors and the other for temperatures. Basically you get 2 graphs per drive. I have a server with 10 drives and did not want to do all the work to add 8 more data sources! And I thought it would be better to make it handle any number of drives. If you're interested I can send you my XML templates.

Regards

Scott

Glen Pitt-Pladdy Image  20/03/2010 09:28 :: Glen Pitt-Pladdy

Hi Scott

Sorry for the delay getting back to you about your post on my blog. It did take a while to pull all the fragments of info together to get SNMP working neatly.

Your approach sounds interesting and I'm sure would help lots of people who have servers with more than 2 disks to monitor.

I am happy for you to post a URL on my blog linking to your code & templates, else I am happy to host them off my server.

Either way, please make sure you give yourself credit for the extra work you have put into the template and scripts - I would suggest a comment in the code about the enhancements with your name and a URL.

Thanks

Glen

Pablolibo Image  09/06/2010 05:34 :: Pablolibo

Could Scot send me your templates, thank a lot

Pitt-Pladdy, thank for it template :D

Anthony Image  15/04/2011 23:01 :: Anthony

Thanks for these, I've also altered your templates to support 4 drives if you'd like a copy of these let me know

Summerborn Image  25/12/2011 08:58 :: Summerborn

Hello,

First -thank oyu very much for this, I was strugling to make this for past few weeks and even started writing smart mib and net-snmp smart extension.....

If I may make two points. First, I had to rewrite the path inside the big xml file cact.... to read my directory for disk_smart.xml file. Second, this line posted above:

extend smartreaderr /etc/snmp/smart-generic devices

has to be rewritten as follows:

extend smartdevices /etc/snmp/smart-generic devices

Glen Pitt-Pladdy Image  25/12/2011 12:31 :: Glen Pitt-Pladdy

Yes, you are quite right about the snmpd.conf line - have updated this. You should only need to change paths if you are putting the file in a different place.

Amine Image  24/01/2012 15:24 :: Amine

I am trying to use your tutorial to get smart infos via snmp

i am getting :

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smarttemp".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smarteccrec".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartairflow".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".1 = STRING: /dev/sda

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartreaderr".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartrealloc".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartseekerr".1 = STRING: NA

would you please tell me how can i fix this , so i do not get any more NA

Glen Pitt-Pladdy Image  24/01/2012 17:51 :: Glen Pitt-Pladdy

Looking at the script, NA is output when it can't get the data about the drive. I would suggest starting by looking at the data file: /var/local/snmp/smart-sda

Check that the cron job is creating it properly with valid data in it.




Are you human? (reduces spam)
Note: Identity details will be stored in a cookie. Posts may not appear immediately