Glen Pitt-Pladdy :: Blog

SMART stats on Cacti (via SNMP)

Update: This was originally one of my first articles on Cacti stats via SNMP, and subsequently I have built an ever growing collection of templates and extension scripts based on the same approaches. Originally this was done as 2-disk templates which where fine for the machines I was working with - my server here is a basic 2-disk setup, so why would I need more. Since that I've worked with all sorts of different disk arrangements and had to fudge things to make useful templates. This update fixes that by bringing things down to two basic templates and switching to indexed SNMP allowing an arbitrary number of disks to be used. As of version 20121214 you can also easily edit the script to index by devices or serial numbers.

This follows on from the basics of SNMP I did previously, this article adds a set of SNMP extension scripts, config, and Cacti templates to monitor hard drives.

Being SMART

Self Monitoring, Analysis, and Reporting Technology is contained in most hard drives these days. It provides a number of built in tests to evaluate the health of a drive and hopefully predict many failures.

Linux has a suite of tools called "smartmontools" which provides a comprehensive set of utilities and a monitoring daemon for checking drives. Configuration of regular testing and monitoring (smartd) is beyond this article and there are plenty of docs around for that already, but what is often useful is to graph key parameters to spot anomalies with parameters which would otherwise go unnoticed.

After installing smartmontools, you can check the basic parameters that drives have with the command:

smartctl -a DEVICE

Where DEVICE is the device for the drive (not a partition). Typically this would be something like /dev/sda (first drive), /dev/sdb (second drive) etc. or /dev/hda (first drive), /dev/hdb (second drive), or some combination of both.

If a drive does not have SMART enabled it will say that in the output of the above. To enable SMART on the drive:

smartctl -s on DEVICE

Note that USB drives do not currently allow SMART data, even though the physical drives inside the boxes are SMART capable. I have no idea why this is the case, and USB drives are the ones I would really like to monitor as they get bashed about more and have poor cooling compared to fixed drives in a system.

Getting SMART over SNMP

Like discussed previously, SMART data requires root privilege to access, and snmpd runs as a low privilege user. What I do is have a CRON job that reads this data and stores it in files for snmpd to access via extension scripts.

If you are using the same config I described previously, then simply add the lines to your /etc/snmp/local-snmp-cronjob file to make it look something like this (may have other content for other tasks):

# where to keep the files
STORE=/var/local/snmp

# update smart parameters
for devfull in /dev/sd?; do
    dev=`basename $devfull`
    /usr/sbin/smartctl -n idle -a $devfull >$STORE/smart-$dev.TMP
    mv $STORE/smart-$dev.TMP $STORE/smart-$dev
done

This code simply runs through devices matching /dev/sd? (ie. /dev/sda, /dev/sdb etc.) and dumps their SMART data to a file in /var/local/snmp as described previously.  From here extension scripts for snmpd can pick it up without requiring privilege.

At this point STOP. Wait for the smart-* files to be created in /var/local/snmp. I suspect a lot of problems reported relate to not getting an early part of the chain working fully before moving on. Don't move on until the files exists.

SMART parameters are numbered and it made sense to me to exploit the numbering in a universal script instead of having to treat each parameter on it's own.

Download:  Perl script to extract SMART parameters for SNMP

I place this script (make it executable first: chmod +x smart-generic) in /etc/snmp

This script takes one argument of the SMART parameter number and outputs the difference (remaining life) between the current value and the threshold for that parameter. It is worth noting that different manufacturers (and even different models and revisions of drives) create these values differently so the value is of little interest on it's own, but unusual fluctuations or downward trends are worth taking note of. For temperatures it is normally necessary to take the raw data which can be done by prefixing the parameter ID with a 'R'.

This is another good time to STOP. Test that the smart-generic script is actually picking up the data:

# /etc/snmp/smart-generic 1

In /etc/snmp/snmpd.conf add the following lines (or others if you want to monitor them):

extend smartdevices /etc/snmp/smart-generic devices
extend smartdescriptions /etc/snmp/smart-generic description
extend smart1 /etc/snmp/smart-generic 1
extend smart3 /etc/snmp/smart-generic 3
extend smart4 /etc/snmp/smart-generic 4
extend smart5 /etc/snmp/smart-generic 5
extend smart7 /etc/snmp/smart-generic 7
extend smart9 /etc/snmp/smart-generic 9
extend smart10 /etc/snmp/smart-generic 10
extend smart12 /etc/snmp/smart-generic 12
extend smart170 /etc/snmp/smart-generic 170
extend smart171 /etc/snmp/smart-generic 171
extend smart172 /etc/snmp/smart-generic 172
extend smart177 /etc/snmp/smart-generic 177
extend smart178 /etc/snmp/smart-generic 178
extend smart179 /etc/snmp/smart-generic 179
extend smart180 /etc/snmp/smart-generic 180
extend smart181 /etc/snmp/smart-generic 181
extend smart182 /etc/snmp/smart-generic 182
extend smart183 /etc/snmp/smart-generic 183
extend smart184 /etc/snmp/smart-generic 184
extend smart187 /etc/snmp/smart-generic 187
extend smart189 /etc/snmp/smart-generic 189
extend smartR190 /etc/snmp/smart-generic R190
extend smartR194 /etc/snmp/smart-generic R194
extend smart195 /etc/snmp/smart-generic 195
extend smart196 /etc/snmp/smart-generic 196
extend smart199 /etc/snmp/smart-generic 199
extend smart203 /etc/snmp/smart-generic 203
extend smart226 /etc/snmp/smart-generic 226
extend smart230 /etc/snmp/smart-generic 230
extend smart231 /etc/snmp/smart-generic 231
extend smart232 /etc/snmp/smart-generic 232
extend smart233 /etc/snmp/smart-generic 233
extend smart235 /etc/snmp/smart-generic 235
extend smart241 /etc/snmp/smart-generic 241

There are many other parameters which you could also monitor and as can be seen, they are easily added by simply referencing the parameter ID and updating templates to match.

Note that the config presented here only looks at /dev/sd? devices. If your system has /dev/hd? devices then you will need to modify the scripts accordingly.

Once you have added all this restart snmpd.

This is another good point to STOP. You can test smart-generic by running it from the command line with appropriate parameters, and via SNMP by appending the appropriate SNMP OID to the "snmpwalk" commands shown in previous articles. Ensure that you get valid output on "snmp" related fields when you walk the extended OID: NET-SNMP-EXTEND-MIB::nsExtendOutLine

Cacti Templates

I have generated some basic Cacti Templates for these SMART parameters with one graph for temperatures and another for health parameters. They are easily extended for more parameters.

For indexed SNMP, Cacti requires an XML file describing how to map the SNMP data to each drive. As this is a local (unpackaged) version I have done my configuration around putting this file in /usr/local/share/cacti/resource/snmp_queries/ and you will need to alter the templates if you put the file elsewhere.

Download: Disk SMART Cacti SNMP Query (XML)

Put this in /usr/local/share/cacti/resource/snmp_queries/ or wherever appropriate for your system. Note that if you change the location then you will also need to update the path to this file in the Cacti Data Query for this template.

Download: Cacti Templates for SMART over SNMP

Simply import this template, and add the data query to the hosts you want to monitor then you should see disks available to monitor and be able to add graphs you want in Cacti. It should just work if your SNMP is working correctly for that device (ensure other SNMP parameters are working for that device).

SSD Support

The big improvement as of version 20121214 is that there are a load of new parameters and templates to support for SSDs. While HDDs have mostly the same stuff from model to model and make to make, every SSD manufacturer has their own ideas on what SMART parameters matter, and that's not surprising since the chipsets are also very different.

I have provided a generic SSD template with everything that seemed to matter on the SSDs I've encournterded, but if you use this template directly then you will end up with loads of nan values. The idea is that you can duplicate this template and then prune that down for the specific model of SSD you are monitoring. I have done this for OCZ Agility-3 and Samsung 830 series devices, but you are free to do this for whatever model you have.

Scaling / normalisation

In many cases different manufacturers (and sometimes even models) have different starting values and thresholds for their parameters. As of version 20121214, the smart-generic script assumes that all parameters start at 100 (mostly the case) and scales them to an graph value of zero at the threshold.

This is however not always the case. Typically a few parameters on most drives may have a different starting or normal value so there is now a pair of hashes (%SCALEBYFAMILY and %SCALEBYMODEL) which provide custom overrides on a device family or device model basis.

Additionally occasionally parameters need ignoring as they are there for informational purposes (via raw values) rather than indicating the drive health and will typically have all zero health values. You can set a scaling of 'U' to hide these values.

Graph Screen Shots

Generic temperature template:

SMART over SNMP on Cacti : Temperatures

Generic HDD template:

SMART over SNMP on Cacti : HDD Health

Samsung 830 series SSD template:

SMART over SNMP on Cacti : Samsung 830 SSD Health

OCZ Agility-3 template:

SMART over SNMP on Cacti : OCZ Agility-3 SSD Health

Comments:

Scott Image  10/03/2010 15:55 :: Scott

Hi Glen,

great work you've done. I had never got my head around populating the SNMP MIB with an external process until now. It always looks easy when it's in front of you!

I modified your Perl script and created an SNMP Query XML which will handle any numbers of drives. It uses only two graph templates, one for errors and the other for temperatures. Basically you get 2 graphs per drive. I have a server with 10 drives and did not want to do all the work to add 8 more data sources! And I thought it would be better to make it handle any number of drives. If you're interested I can send you my XML templates.

Regards

Scott

Glen Pitt-Pladdy Image  20/03/2010 09:28 :: Glen Pitt-Pladdy

Hi Scott

Sorry for the delay getting back to you about your post on my blog. It did take a while to pull all the fragments of info together to get SNMP working neatly.

Your approach sounds interesting and I'm sure would help lots of people who have servers with more than 2 disks to monitor.

I am happy for you to post a URL on my blog linking to your code & templates, else I am happy to host them off my server.

Either way, please make sure you give yourself credit for the extra work you have put into the template and scripts - I would suggest a comment in the code about the enhancements with your name and a URL.

Thanks

Glen

Pablolibo Image  09/06/2010 06:34 :: Pablolibo

Could Scot send me your templates, thank a lot

Pitt-Pladdy, thank for it template :D

Anthony Image  16/04/2011 00:01 :: Anthony

Thanks for these, I've also altered your templates to support 4 drives if you'd like a copy of these let me know

Summerborn Image  25/12/2011 08:58 :: Summerborn

Hello,

First -thank oyu very much for this, I was strugling to make this for past few weeks and even started writing smart mib and net-snmp smart extension.....

If I may make two points. First, I had to rewrite the path inside the big xml file cact.... to read my directory for disk_smart.xml file. Second, this line posted above:

extend smartreaderr /etc/snmp/smart-generic devices

has to be rewritten as follows:

extend smartdevices /etc/snmp/smart-generic devices

Glen Pitt-Pladdy Image  25/12/2011 12:31 :: Glen Pitt-Pladdy

Yes, you are quite right about the snmpd.conf line - have updated this. You should only need to change paths if you are putting the file in a different place.

Amine Image  24/01/2012 15:24 :: Amine

I am trying to use your tutorial to get smart infos via snmp

i am getting :

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smarttemp".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smarteccrec".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartairflow".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".1 = STRING: /dev/sda

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartreaderr".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartrealloc".1 = STRING: NA

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartseekerr".1 = STRING: NA

would you please tell me how can i fix this , so i do not get any more NA

Glen Pitt-Pladdy Image  24/01/2012 17:51 :: Glen Pitt-Pladdy

Looking at the script, NA is output when it can't get the data about the drive. I would suggest starting by looking at the data file: /var/local/snmp/smart-sda

Check that the cron job is creating it properly with valid data in it.

Vampire1984 Image  28/02/2012 13:01 :: Vampire1984

Hello. I have got this error in Console->Devices->Edit "Data Query Debug Information"

+ Running data query [13].

+ Found type = '3' [snmp query].

+ Found data query XML file at '/usr/share/cacti/site/resource/snmp_queries/disk_smart.xml'

+ XML file parsed ok.

+ Executing SNMP walk for list of indexes @ 'NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices"'

+ No SNMP data returned

+ Found data query XML file at '/usr/share/cacti/site/resource/snmp_queries/disk_smart.xml'

+ Found data query XML file at '/usr/share/cacti/site/resource/snmp_queries/disk_smart.xml'

+ Found data query XML file at '/usr/share/cacti/site/resource/snmp_queries/disk_smart.xml'

I tried runing it on windows machine but i think this instruction is for linux becaues it doesnt work. Or i have do a mistake somewhere in steps above?

Glen Pitt-Pladdy Image  28/02/2012 13:13 :: Glen Pitt-Pladdy

These instructions are for Debian or Ubuntu though should work on other Linux distros though may need a few changes. The important thing to note is that using snmpd on the Linux/Unix host we are monitoring and are adding extensions to snmpd to collect the data using smartmontools.

With monitoring Windows hosts you would have to extend the SNMP service in a similar way to provide the same data then you may be able to use this template or adapt this template for Windows use.

I am not sure if there are any implications to running Cacti on Windows and monitoring an Linux/Unix host with the extensions. That may well be possible without to much trouble.

From the error logs you give it looks like Cacti found the data query OK and is trying to check the SNMP where it is failing. My guess is that the host you are trying to monitor is not running the snmpd extensions given here (or Windows equivalent) correctly and that is why it is failing.

I hope that is useful to you.

Vampire1984 Image  28/02/2012 17:31 :: Vampire1984

No no:), I have got virtual machine which is on debian. Cacti snmp php apache are installed the newest from version stable. Snmp works, cacti draws some graphs on other default settings + template with processes(on windows host). This error log is from creating graph for windows host. My question was is this instruction only for linux hosts/servers or this script can 'catch' windows hosts too?

Glen Pitt-Pladdy Image  28/02/2012 18:04 :: Glen Pitt-Pladdy

Ah! So if I understand right:

* running Cacti on a Debian VM

* monitoring disks on a Windows host via SNMP

If that's the case then this is not going to work without a whole lot of extra stuff on the Windows host.

The way that this monitoring works is that Cacti acts as a SNMP client (no need to install anything extra on the Cacti box/VM), collecting remote data and graphing. The remote host you are monitoring has to be an SNMP server (in the case of Linux running snmpd).

Stock SNMP servers (Linux or Windows) will not give the SMART data needed for this. In order to send the SMART data over SNMP, we have to extend the SNMP server on the host we are monitoring with a load of scripts. That is described here for monitoring Linux hosts running snmpd only.

It may be possible to extend the Windows SNMP service in a similar way, but that's beyond my Windows knowledge so I can't help with that.

I hope I understood right this time :)

Vampire1984 Image  28/02/2012 19:41 :: Vampire1984

Yes, Youre right. Cacti installed and configured on debian virtual machine vmware(created on vsphere 1.0, running on vserver 2.0). I monitor debian vm, some printers, router, 2 servers(SLES[yet snmp cannot connect to him but working on that]+W2K8R2), windows xp hosts. On windows hosts is installed smartmon tools for windows. smartctl -a /dev/sda recognizes the hdd and writes on cmd cli values of smart so it works directly on host. Maybe is it possible to rewrite script to work with smartctl on windows host?

Glen Pitt-Pladdy Image  28/02/2012 20:03 :: Glen Pitt-Pladdy

It does appear it is possible to extend the Windows SNMP service but it requires writing a .dll - see: http://stackoverflow.com/questions/136206/how-can-i-write-an-snmp-agent-or-snmp-extension-agent-dll-in-c-sharp

Another possibility is that NetSNMP (what we use on Linux) is also available for Windows: http://www.net-snmp.org/docs/README.win32.html

The extension scripts are all designed around snmpd on Linux/Unix and will likely need modifying to work with Windows Net-SNMP.

That's not something I can help with - I mainly do Linux/Unix work.

Vampire1984 Image  28/02/2012 21:03 :: Vampire1984

OK thank u very much for tips, i will look at those links and figure something out;)You helped me:) Cheers:)

nA ni sivAm Image  05/03/2012 10:25 :: nA ni sivAm

Is there any possibility that I can get a template that can be imported by cacti Version 0.8.7d

I would like to get this smartmon work desperately... If you can get me a cacti 0.8.7d template I would be thankful...

Thanks again!

Glen Pitt-Pladdy Image  05/03/2012 10:35 :: Glen Pitt-Pladdy

This question comes up regularly enough that I have just created an article about it. See: http://www.pitt-pladdy.com/blog/_20120305-102839_0000_Cacti_hack_for_forward_compatibility/

nA ni sivAm Image  05/03/2012 12:10 :: nA ni sivAm

okay. I tried this- updated the global_arrays.php.... now I get a different error:

"Error: XML: Generated with a newer version of Cacti."

Earlier i got this one- "Error: XML: Hash version does not exist." Can you please help me matey!

Glen Pitt-Pladdy Image  06/03/2012 08:58 :: Glen Pitt-Pladdy

That's a very old version of Cacti so it's possible it's simply too old. I might well be a whole lot less work just to upgrade Cacti to a more recent version.

nA ni sivAm Image  06/03/2012 10:26 :: nA ni sivAm

Thanks mannn...

Anyways if anybody wants to try importing, the trick is to change the hash numbers the numbers from the 3rd to 6th pos represent the version that the cacti supports min. If you change it manually it will be imported. But whether it will work or not will be dependent on the features used in that template!

with respect to this template it works with 0.8.7d...

nA ni sivAm Image  06/03/2012 10:27 :: nA ni sivAm

But soon after that I just got NaN everywhere, not something I would expect! :(

Reading thru the comments, I found out that

1. the local-snmp-cronjob file had to be modified - for some reason 'sed' in the way mentioned doesnt work in mine (both red hat/ suse)

2. smart-generic must have the valid path, i.e., if u hav not used the default path /var/local/snmp then u need to change $FILE in smart-generic

After these changes were made, it works.... partially!

nA ni sivAm Image  06/03/2012 10:27 :: nA ni sivAm

This is where I need your help... again!

-The first graph looks fine the temp is 25-30 and the airflow is 100 (always, is this ok?)

-The second graph has only 3 values, rest are nan. (ie., Reallocated Sectors, Seek Errors and ECC Recovered have values. The other fields- Raw errors, Poweron Hours, High Fly Writes are NaN)

is the smart-generic not parsing properly? may be the smartmonctl has a different output? may be becos of different version? i have my manual unix thing graphing poweron hrs. So it is there, it is just some text parsing issues right matey?

Glen Pitt-Pladdy Image  06/03/2012 10:57 :: Glen Pitt-Pladdy

SMART does vary between different drives. Check your snmpd.conf the spec for which parameter airflow is referencing. If there is an "R" in front of the parameter number then it takes the RAW_VALUE column rather than the VALUE column. For temperatures I would normally take the RAW_VALUE which on all my drives is the actual measured temperature, hence airflow is R190.

It sounds like you may have a drive that is putting a processed number into the RAW_VALUE field for airflow. Have a look at the cache files generated by the cron job and see what is in those parameters for your drives. On almost all my drives temp and airflow are very close.

For "nan" fields, your drives may have fewer and/or different parameters. Again, check the cache file for the drive. Since all the parameters are referenced by their number, you can add more to snmpd.conf, update the .xml with them, and then in Cacti update the Data Template, Graph Template and the Data Query accordingly, then re-add the graphs.

nA ni sivAm Image  06/03/2012 11:03 :: nA ni sivAm

thanks for the reply... will look at the cache files tonight and try to get some positive reply mate!

i am pretty sure I will have some q's for you!! thanks again

nA ni sivAm Image  06/03/2012 11:35 :: nA ni sivAm

had a quick look at the cache file and the data template... why is the airflow min max set to 0-1000 and others 0-100? any idea? could it be the reason why airflow is always 1000? the cache shows identical airflow and temp readings.

the other values are pretty high in the cache file.. however the data template is set to expect values between 0-100. could it be why i get NaN?

nA ni sivAm Image  06/03/2012 11:43 :: nA ni sivAm

please ignore the post above...

had a quick look at the cache file and the data template... why is the airflow min max set to 0-1000 and others 0-100? any idea? could it be the reason why airflow is always 100? the cache shows identical airflow and temp readings around 25-30.

the other values are pretty high in the cache file.. for eg. Power on hrs=382. Seekerror Rate=21639493. However the data template is set to expect values between 0-100. could it be why i get NaN?

Regd. using R as a prefix in snmpd.conf, are you saying R190 gets the RAW_VALUE. Using just 190 gets me the VALUE? I will try few things tonight and get back to you with details.

Glen Pitt-Pladdy Image  06/03/2012 11:51 :: Glen Pitt-Pladdy

There is a chance that the data is not being picked up correctly by smart-generic. Try running it manually and see what you get:

$ /etc/snmp/smart-generic R190

35

34

34

25

The limits could be responsible for the NaNs but again smart-generic should be picking up the VALUE (remaining life / health) field for that and that is normally in the range 0-100 on my drives. Again different drives may report different numbers.

You can also try running smart-generic for the parameters you are getting NaN on and see what happens, but make sure you run it exactly as in snmpd.conf so that it picks up RAW_VALUE or VALUE as snmpd would. Running an snmpwalk may also be useful to know what Cacti is receiving via snmp.

nA ni sivAm Image  06/03/2012 11:57 :: nA ni sivAm

extend smartdevices /etc/snmp/smart-generic devices

extend smartreaderr /etc/snmp/smart-generic 1

extend smartrealloc /etc/snmp/smart-generic 5

extend smartseekerr /etc/snmp/smart-generic 7

extend smartseekerr /etc/snmp/smart-generic 9

extend smartairflow /etc/snmp/smart-generic 189

extend smartairflow /etc/snmp/smart-generic R190

extend smarttemp /etc/snmp/smart-generic R194

extend smarteccrec /etc/snmp/smart-generic 195

are respectively:

The device list used as the index

Raw_Read_Error_Rate

Reallocated_Sector_Ct

Seek_Error_Rate

Power_On_Hours

High_Fly_Writes

Airflow_Temperature_Cel (RAW)

Temperature_Celsius (RAW)

Hardware_ECC_Recovered

Are you sure there are two smartairflow and 2 smartseekerr

extend smartseekerr /etc/snmp/smart-generic 7

extend smartseekerr /etc/snmp/smart-generic 9

extend smartairflow /etc/snmp/smart-generic 189

extend smartairflow /etc/snmp/smart-generic R190

Glen Pitt-Pladdy Image  06/03/2012 12:07 :: Glen Pitt-Pladdy

Wow! Well spotted! I must have messed that up when I updated this article with the new indexed template.

I have updated the article with the correct entries for snmpd.conf (pasted directly from the server I develop the template on).

nA ni sivAm Image  06/03/2012 12:11 :: nA ni sivAm

okay..

I ran smart-generic for all the parameters needed by graph.. they are right.

I highly doubt that there is some typo in the extend thing done in snmpd.conf. shudnt there be any connection between the snmpd.conf and the data template? can u please recheck?

How should i read these extends using snmpwalk? can u please show me an example mate!

Glen Pitt-Pladdy Image  06/03/2012 12:57 :: Glen Pitt-Pladdy

For an Indexed template the entries in snmpd.conf are connected to the Data Sources via the .xml file and the associated Data Query in the template. As some of your graphs are working I think it's likely the .xml and the Data Query are working.

I definitely think it's worth running a snmpwalk against NET-SNMP-EXTEND-MIB::nsExtendOutLine (like described in my "SNMP basics" article) to see if all the SMART parameters are being reported correctly - at least then we know if the problems are SNMP or Cacti. You should get stuff like:

...

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartpwrcyc".2 = STRING: 77

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartairflow".1 = STRING: 35

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartairflow".2 = STRING: 34

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".1 = STRING: /dev/sda

...

Check all the problem parameters are being reported correctly. You may want to pipe the command through "grep smart" to extract only the SMART lines.

nA ni sivAm Image  06/03/2012 18:30 :: nA ni sivAm

All good matey! thanks for all the help...

pm pet Image  28/06/2012 16:49 :: pm pet

Hey,

I can't retvieve any numbers via snmpwalk. When I run "smart-generic 7", it nicely print numbers. But something between snmpd and smart-generic is some problem, I think.

All paths are ok.

extend smartdevices /etc/snmp/smart-generic devices

extend smartreaderr /etc/snmp/smart-generic 1

extend smartrealloc /etc/snmp/smart-generic 5

extend smartseekerr /etc/snmp/smart-generic 7

All privileges are fine and everything. And when I'm creating graphs in cacti, there's is just millions rows showing zeros. Nothing human-readable. Any advice?

Glen Pitt-Pladdy Image  28/06/2012 20:24 :: Glen Pitt-Pladdy

When you say you can't retrieve any numbers with snmpwalk, are the OIDs present in the output, just with invalid data, or are the OIDs missing all together?

pm pet Image  28/06/2012 22:11 :: pm pet

All are missing. I'm using snmpwalk -v2c -c public <ip>, isn't that correct?

Glen Pitt-Pladdy Image  28/06/2012 22:15 :: Glen Pitt-Pladdy

By default snmpwalk will walk up the enterprises MIB so for checking these extended entries you will need to specify those:

$ snmpwalk -v2c -c public <ip> NET-SNMP-EXTEND-MIB::nsExtendOutLine

Then see if the data is being delivered correctly by snmpd and that will tell us if the problem is on the snmpd side or Cacti.

pm pet Image  29/06/2012 01:15 :: pm pet

Hy. That did help. And, debian stable doesn't have MIB-stuff in stable repository. So for all you Stable-users, you need to download package snmp-mibs-downloader from packages.debian.org and install it to use Extend mibs. now it works, thank you!

pm pet Image  29/06/2012 01:29 :: pm pet

OK I had fun way too early. snmpwalk now works and shows correct information. But still, in Cacti, when I click devices, add "Disk SMART Parameters" data query and click Create graphs. In column "Data Query [Disk SMART Parameters]" it just shows 230 rows, mostly numbers between 0-10. But also there are results like "_linkUpDown", "_triggerFail" etc. You have any idea what I'm doing wrong? I just can't figure out.

Glen Pitt-Pladdy Image  29/06/2012 09:10 :: Glen Pitt-Pladdy

I believe you can add the non-free section into your sources.list in order to install snmp-mibs-downloader. This applies to Ubuntu as well.

As for the remaining problems, it sounds as if there is some problem with the index. In the snmpwalk output check the "smartdevices" lines and see that they are all valid. That should be where Cacti picks up the list of devices from. It's also possible it's cached a bad list so also try clicking the green circle to update it.

pm pet Image  29/06/2012 14:42 :: pm pet

Heh I wrote so long message that I couldn't post it here, so here's my answer:

http://pastebin.com/QJUPpB7N

Glen Pitt-Pladdy Image  29/06/2012 18:32 :: Glen Pitt-Pladdy

I have to agree, the snmpd side looks all good. Not all drives support all the parameters graphed hence the NA. So long as most parameters are valid it should be good. One thing that may have some effect - are you running the snmpwalk on the cacti server?

I would also suspect some sort of problem relating to Cacti - it seems to pick up the necessary stuff from the XML (which I should actually tidy up a bit), but with the snmpwalk it does to pick up the devices logging everything, it looks like that's where it goes wrong. That's why I think it's worth verifying the exact snmpwalk from the cacti server:

$ snmpwalk -v2c -c public <ip> 'NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices"'

Check that against the same command on the server you are trying to monitor and see if there are differences.

pm pet Image  30/06/2012 17:56 :: pm pet

Both are same:

cacti $ snmpwalk -v2c -c public server 'NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices"'

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".1 = STRING: /dev/sda

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".2 = STRING: /dev/sdb

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".3 = STRING: /dev/sdc

server $ snmpwalk -v2c -c public server 'NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices"'

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".1 = STRING: /dev/sda

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".2 = STRING: /dev/sdb

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".3 = STRING: /dev/sdc

I don't really know where to find problem anymore, I'm not so familiar with cacti's XML-stuff and ways to collect snmp-information. :(

Glen Pitt-Pladdy Image  30/06/2012 19:26 :: Glen Pitt-Pladdy

I think that proves the snmp stuff is right which only leaves what is going on in Cacti. Details on XML queries are at http://docs.cacti.net/howto:data_query_templates

Can you confirm what version of Cacti you are using and where it came from? Is it the one that ships in Debian stable or some other version from elsewhere? Has it been modified/patched in any way? The reason I ask is that my verbose query is subtly different other than only finding 4 indexes:

+ Running data query [11].

+ Found type = '3' [snmp query].

+ Found data query XML file at '/usr/local/share/cacti/resource/snmp_queries/disk_smart.xml'

+ XML file parsed ok.

+ Executing SNMP walk for list of indexes @ 'NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices"'

+ Index found at OID: .......

pm pet Image  02/07/2012 02:00 :: pm pet

Cacti is version 0.8.8a and downloaded from cacti's site, I didn't use Debian's package. It has no mods/patches, pure install and added only monitor&Settings&thold -plugins.

This version is pretty new, have they changed way to get data?

pm pet Image  02/07/2012 02:15 :: pm pet

Sorry for multiple posts but, graphs just started to work. I didn't touch anything, I don't really know what happen. Now everything works fine. I still would like to know why those didn't work in the beginning. Even though now it works, it still says this:

+ <oid_num_indexes> missing in XML file, 'Index Count Changed' emulated by counting oid_index entries

But still everything works and it doesn't print all snmpwalk-stuff. Strange, duh, atleast it works. Thank you really much!

Glen Pitt-Pladdy Image  02/07/2012 09:13 :: Glen Pitt-Pladdy

It does seem a little peculiar. I'm not familiar with all the changes in newer versions and there is a chance that there is something not quite working with the newer version and older templates/queries, or perhaps just it cached some invalid data for some reason.

The good thing is it's working.

Karel Image  26/10/2012 11:17 :: Karel

Hi, I'm using regular debian cacti 0.8.7g and it seems I have the same problem as "pm pet".

The Data Query for Disk SMART Parameters has [274 Items, 274 Rows]

smptwalk is giving me a regular

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".1 = STRING: /dev/sda

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".2 = STRING: /dev/sdb

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".3 = STRING: /dev/sdc

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".4 = STRING: /dev/sdd

NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".5 = STRING: /dev/sde

Glen Pitt-Pladdy Image  26/10/2012 19:36 :: Glen Pitt-Pladdy

Ok - this does seem like a bit of a bizarre one. Let's first see if we can flush anything in Cacti that may be upsetting things. Try this:

1) Go to Data Queries->Disk Smart Parameters, check that it says "Successfully located XML file", Click save

2) Click Smart Health and check that all the drop-downs are valid

3) Go to New Graphs and select the relevant host, find the Disk Smart Parameters and click the green circle to refresh it - is it now working?

4) If not then go to System Utilities and click "Rebuild Poller Cache" - repeat 3 and see if it's working.

Let me know if at any stage it starts working again when doing that. I'm beginning to wonder if Cacti is doing something unexpected and caching the bad data the first time round.

mancy Image  11/01/2013 14:05 :: mancy

All working perfectly, I love you Glen :D

On Debian Squeeze, remember to install the package snmp-mibs-downloader

and comment the line in /etc/snmp/snmp.conf accordingly

as documented for example here: http://www.ghachey.info/weblog/2012/05/24/debian-squeeze-snmp-basic-setup/

Thank you Glen!!

zchef2k Image  21/02/2013 20:32 :: zchef2k

No matter what I try, I get the following in debug:

+ Running data query [11].

+ Found type = '4' [Script Query].

+ Found data query XML file at '/usr/share/cacti/resource/snmp_queries/disk_smart.xml'

+ Error parsing XML file into an array.

+ Found data query XML file at '/usr/share/cacti/resource/snmp_queries/disk_smart.xml'

+ Found data query XML file at '/usr/share/cacti/resource/snmp_queries/disk_smart.xml'

+ Found data query XML file at '/usr/share/cacti/resource/snmp_queries/disk_smart.xml'

Using 0.8.8a on el6.

Thanks for the hard work on this. Can't wait to get it workingon my end.

Glen Pitt-Pladdy Image  22/02/2013 07:22 :: Glen Pitt-Pladdy

The thing that looks suspicious is "Script Query" - this is SNMP and not Script. The real question is how it's managing that as all that should be in the template bundle. That's the thing I would suggest investigating first.

zchef2k Image  22/02/2013 14:48 :: zchef2k

Sorry. That selection was from desparation. I have managed to get the templates working. I deleted them and reimported them. Cacti seems to be happy ath the moment. Hoever, my graphs are showing 'nan' but acti has been polling for about 14 hours. In the polling logs I can see there's SNMP values being collected, but the graphs don't reflect the data. I do not see any rows in the mysql poller table. FOrgive my cacti n00bness.

Glen Pitt-Pladdy Image  22/02/2013 21:26 :: Glen Pitt-Pladdy

As always with diagnostics, look at a stage in the process which will yeild maximum information and divide the problem into clear parts to break down further. I would suggest checking with snmpwalk if valid data is available via SNMP - that neatly cuts between a Cacti and snmpd. This is described in the snmp basics article linked above.

Karel Image  28/02/2013 13:58 :: Karel

it seems that i'm getting only "U" from smart-generic script

my disc smartctl: http://pastebin.com/3WTAxux0

i believe that the script should be somehow updated for this disc

Glen Pitt-Pladdy Image  28/02/2013 22:09 :: Glen Pitt-Pladdy

I've put your smart data into a test setup and it works perfectly for me. I suspect the problem may relate to a discrepancy in config.

Karel Image  04/03/2013 08:21 :: Karel

you're right, it was a problem in smart-generic - it was unable to locate the exported files

thanks

scott Image  28/03/2013 19:57 :: scott

This is great stuff, and works flawlessly with direct attached drives. However, I have several systems with 3ware raid cards. Can you suggest how to modify your cacti scripts to make these drives available for graphing?

Glen Pitt-Pladdy Image  28/03/2013 21:44 :: Glen Pitt-Pladdy

Thanks for the feedback. I don't have any 3ware based systems, but I believe smartctl is capable of interrogating devices behind them so you would need to modify the cronjob to also collect SMART info for all the devices behind the 3ware, and then modify smart-generic to pick those files up as well.

Looking at the smartctl man page suggests that the particular way each model needs to be handled is different - that is certainly beyond the scope of what is practical for me to include in this standard version.

K0tz13 Image  20/04/2013 15:26 :: K0tz13

I have the same problem Karel and pm pet have/had, [504 Items, 252 Rows]. When I do verbose query in cacti I get the following:

+ Running data query [11].

+ Found type = '3' [SNMP Query].

+ Found data query XML file at '/usr/share/cacti/resource/snmp_queries/disk_smart.xml'

+ XML file parsed ok.

+ <oid_num_indexes> missing in XML file, 'Index Count Changed' emulated by counting oid_index entries

+ Executing SNMP walk for list of indexes @ 'NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices"' Index Count: 5734

+ Index found at OID: '1.3.6.1.2.1.1.1.0' value: 'Linux watson 3.5.0-27-generic #46~precise1-Ubuntu SMP Tue Mar 26 19:33:21 UTC 2013 x86_64'

+ Index found at OID: '1.3.6.1.2.1.1.2.0' value: 'OID: .1.3.6.1.4.1.8072.3.2.10'

+ Index found at OID: '1.3.6.1.2.1.1.3.0' value: '360548'

+ Index found at OID: '1.3.6.1.2.1.1.4.0' value: 'Me me@example.org'

...

followed by all OID entries possible. I'm using cacti 0.8.8a and cacti-spine 0.8.7i. Snmpwalk as above works fine.

Glen Pitt-Pladdy Image  20/04/2013 16:55 :: Glen Pitt-Pladdy

To debug the problem you have to check things at stages along the way that yield useful diagnostic information, hence all the talk in these articles about running snmpwalk and the snmpd scripts manually (ie. from the command line).

I suspect these problems relate to not having data coming through before tackling later steps. I've updated the smart-generic script (version 20130420) to give some useful information when it can't read the data files rather than the normal 'U'. Use this new version and take things step by step.

Start with running the snmpd-generic script with relevant arguments for the parameter you want to see, then run snmpwalk from the command line on the host you are monitoring, then on the Cacti server, checking the output at each stage.

K0tz13 Image  21/04/2013 11:18 :: K0tz13

I think I found the cause the problem; the default ubuntu snmpd.conf ships with the following lines enabled:

# Arbitrary extension commands

#

extend test1 /bin/echo Hello, world!

extend-sh test2 echo Hello, world! ; echo Hi there ; exit 35

# extend-sh test3 /bin/sh /tmp/shtest

and those hello-world's are also printed when doing a:

snmpwalk -v2c -c <com> <ip> 'NET-SNMP-EXTEND-MIB::nsExtendOutLine'

When I commented those lines out (and restarted snmpd) everything started working! What I don't understand is why those test-extensions cause Cacti to fail. The verbose query now shows what I would have expected the 1st time:

+ <oid_num_indexes> missing in XML file, 'Index Count Changed' emulated by counting oid_index entries

+ Executing SNMP walk for list of indexes @ 'NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices"' Index Count: 4

+ Index found at OID: '1.3.6.1.4.1.8072.1.3.2.4.1.2.12.115.109.97.114.116.100.101.118.105.99.101.115.1' value: 'sda'

etc.

adinovic Image  02/05/2013 14:21 :: adinovic

thx for the great script!

the scripts work for monitoring my centos box, but on my thecus-nas (n8800), when smart-generic executed by perl module (www.fajo.de/main/en/thecus/modules/perl514), the output are:

Use of uninitialized value $param in pattern match (m//) at ./smart-generic line 123.

Use of uninitialized value $param in substitution (s///) at ./smart-generic line 125.

Use of uninitialized value $param in pattern match (m//) at ./smart-generic line 127.

Use of uninitialized value $param in lc at ./smart-generic line 129.

Use of uninitialized value $param in lc at ./smart-generic line 129.

FATAL-need the numeric parameter or 'description' to show

snmpwalk -v1 -cmon ip-host NET-SNMP-EXTEND-MIB::nsExtendOutLine does vomit output, but the string only show value 101/100/U, cacti also don't generate rrd, i guess it's parsing problem by smart-generic being executed unproperly in thecus, but i dont really know the problem, i'm not a perl programmer :D

Glen Pitt-Pladdy Image  02/05/2013 18:49 :: Glen Pitt-Pladdy

In this case $param is the command line parameter passed to the script and tells it what info to return. In the snmpd.conf lines that have to be added the parameter is passed to the script, but based on what you have here it looks extremely likely that the script is not being run with a parameter or something about the version of Perl is discarding the parameter. As the script says "FATAL-need the numeric parameter or 'description' to show"

I would need to see the full output of the snmpwalk relating to SMART (ie. run "snmpwalk -v1 -cmon ip-host NET-SNMP-EXTEND-MIB::nsExtendOutLine | grep smart") to be able to see if it's working as expected within snmpd.

adinovic Image  03/05/2013 04:01 :: adinovic

i've pasted the output here http://pastebin.com/NZPc4EyT, hope it can help you. thx!

Glen Pitt-Pladdy Image  03/05/2013 07:22 :: Glen Pitt-Pladdy

To me that looks like it totally is working. What I suspect is happening though is that rather than using udev, it's got a bunch of static device files which go beyond sdz (ie. sdaa, sdab etc.) and that is confusing sorting and resulting in the U values in the devices output. That's just untidy rather than a real problem.

I see no reason why this won't work with Cacti unless it's got a quirk about indexes with U values. I see good data for 7 drives - sda through sdg.

To improve things you could hard-code the drives you have: in smart-generic "my @drives = ('sda','sdb'...'sdg');" and remove the other code from opendir to closedir, and in smart-cron "for devfull in /dev/sda /dev/sdb ... /dev/sdg; do"

adinovic Image  06/05/2013 20:03 :: adinovic

thx for your guidance sir. anyway,i finally get my thecus cacti smartmon work, i'm not sure what's happened. the things i did, i delete all the smart data sources, and then create graphs with smart health hdd template first, after the rrd came out, i create graphs from smart temperature template. also,i just realized,i get the same perl error in centos box which is run fine, so i think it's not a problem at all

Ethan Image  21/05/2013 17:14 :: Ethan

I have run in to a bit of a snag. When I do a verbose query in cacti this is what I get:

+ Running data query [10].

+ Found type = '3' [SNMP Query].

+ Found data query XML file at '/usr/local/share/cacti/resource/snmp_queries/disk_smart.xml'

+ XML file parsed ok.

+ <oid_num_indexes> missing in XML file, 'Index Count Changed' emulated by counting oid_index entries

+ Executing SNMP walk for list of indexes @ 'NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices"' Index Count: 0

+ No SNMP data returned

I can see the extends with an snmpwalk: http://pastebin.com/SrAikDUE

However if I try to walk just the descriptions I get an index out of range error:

$ snmpwalk ... 10.0.22.245 NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdescriptions"

NET-SNMP-EXTEND-MIB::nsExtendOutLine.smartdescriptions: Unknown Object Identifier (Index out of range: smartdescriptions (nsExtendToken))

Any ideas?

Cheers!

Ethan.

Glen Pitt-Pladdy Image  24/05/2013 07:37 :: Glen Pitt-Pladdy

Hi Ethan

The "Unkown Object Identifier" is likely the shell processing the quotes - try putting the whole of the OID in single quotes to stop that, or escape (with a backslash) each double quote so it's passed to snmpwalk unaltered.

To me everything looks in order from the output of snmpwalk, so I would next verify that the .xml file has not been modified/corrupted in some way that it no longer matches, and that the query matches the path to that file, and stuff like around that area. This looks like it comes down to something around Cacti not being able to match up the output of snmpwalk with the query xml.

Szymon Image  17/06/2013 02:21 :: Szymon

Hi,

I can't get it working on Debian Squeeze with 0.8.7g-1+squeeze1. I have the data query associated with a host and it finds the devices and descriptions properly. However, when I click 'Create Graphs for this Host', select one of the devices and create graph type 'SMART Temperatures' I end up with graphs associated with some crippled datasources for R190 and R194. Namely in the Custom Data section I get an empty text box for Index Value (normally it's a drop-down select by the way) and next to Output Type ID and Index Type the text 'Data query data sources must be created through New Graphs.' is displayed.

I'm puzzled because New Graphs is exactly the way I created the graph. I spend half of the night trying to get this working but I'm out of ideas now. What can possibly be wrong if it works for other people? I'm using SNMP v3 if that's of any relevance. The values are definitely there (at least for R194) when I do snmpwalk across NET-SNMP-EXTEND-MIB::nsExtendOutLine.

Szymon Image  18/06/2013 02:54 :: Szymon

It seems that Cacti get's confused by multiple "U" in indexes, although it looks fine at the stage of the query and graph creation. I guess if the smart-generic script finds no data for some disks dumped by the cron script, it should not expose them at all. If someone (like me) decides not to monitor some of the drives and modifies the cron script not to dump smartctl output for them, Cacti will fail to create proper data sources in an utterly incomprehensible way.

Glen Pitt-Pladdy Image  18/06/2013 07:33 :: Glen Pitt-Pladdy

I'm not sure the problem is Cacti in this case. I certainly wouldn't bet on the snmpd extension script behaving as expected without valid input from the cron job. The aim is that the cron job collects (and caches) the info needed, then you choose which disks you monitor (with info from the cached data) within Cacti, but if you start modifying components of the system then other parts may not get their expected inputs and behave unpredictably. Essentially, the whole system has to hang together.

Szymon Image  18/06/2013 08:46 :: Szymon

What you say about the cron job is correct when you have an low shelf entry level server. If the disks are behind a RAID controller, there are many different ways to access SMART, depending on the controller (various tricks with -d for HP SmartArray, 3ware etc.). For Adaptec controllers it is pretty straighforward - you use /dev/sg[1-9][0-9]+ devices, but only some of them are the real physical disks. The rest represent the logical arrays, the controller itself or the hot-swap chassis. That is why I chose not to dump SMART for those (it ends up with error anyway). And it appears that Cacti is compeletly lost if there are duplicated values ("U"s) in the index fields. You would have to try it for yourself to see what I mean - it fails with the latest version as well.

Mind you for SSD's R241 may be worth graphing as well. That is actually the only one I needed and scripted based on your work. Great job, thanks :-)

paul Image  31/10/2013 01:09 :: paul

This is something I wish I had figured out how to do. Couple of problems I need to figure out. The first of which is that the smart-generic script doesn't return anything on the command line. The data files are there.

/usr/local/bin/smart-generic.pl R194

ls -l /usr/local/etc/smart/snmp/

total 40

-rw-r--r-- 1 root wheel 6244 Oct 30 18:00 smart-ad2

-rw-r--r-- 1 root wheel 11576 Oct 30 18:00 smart-ad3

grep /usr/local/etc/smart/snmp /usr/local/bin/smart-generic.pl

my $FILES = '/usr/local/etc/smart/snmp';

So all is good there. The templates are installed.

Walking the extended tree — snmpwalk -Os -c xxx shuttle NET-SNMP-EXTEND-MIB::nsExtendOutLine | grep smart — gives me:

nsExtendOutLine."smart1".1 = STRING:

and so on

This is on FreeBSD 8.4, if that matters.

paul Image  31/10/2013 02:25 :: paul

[sfx: hammering, sawing]

Well, there's a problem. The files are being saved in $FILES/smart-$drive in the cronjob but accessed as "$FILES-$drive"; Not sure why the dash or truncated name. Once I fixed that, it all works.

Here's a diff of what I needed to do to make it work.

31c31

< my $FILES = '/var/local/snmp/smart';

---

> my $FILES = '/usr/local/etc/smart/snmp';

139,140c139,140

< if ( $drive !~ /^sd[a-z]+$/ ) { next; } # skip non drives

< push @drives, $drive

---

> if ( $drive !~ /^ad[0-9]+$/ ) { next; } # skip non drives

> push @drives, $drive;

146,147c146,147

< if ( ! -f "$FILES-$drive" ) {

< print "U\n";

---

> if ( ! -f "$FILES/smart-$drive" ) {

> print "missing data file?\n";

151c151

< open DR, "$FILES-$drive"

---

> open DR, "$FILES/smart-$drive"

Glen Pitt-Pladdy Image  31/10/2013 19:17 :: Glen Pitt-Pladdy

Hi Paul - the thing I notice is that you have change the storage path and modified the $FILES which included the prefix to the file, and in your case you have removed the prefix part. Then to compensate for the removal of the prefix you have modified the rest of the script to re-add the prefix into the code. The original code was fine, just so long as you don't take out the prefix when you change any paths.

paul Image  02/11/2013 20:03 :: paul

Ah, so the last piece of "/var/local/snmp/smart" is a conflation of a file name prefix and a path, not just a path, as I expected? I usually don't mix paths and file names when I create a variable if I expect other people to use something I made so that's where that crept in. I make no assumptions about where something will be used or how other systems are organized. The device naming convention (sd[a-z] or hd[a-z]) are also non-portable. I assume those are linux-isms. Not much use to me.

Glen Pitt-Pladdy Image  03/11/2013 17:11 :: Glen Pitt-Pladdy

Yup - it is a bit inconsistent and something that is tidier in the development version. Portability isn't my aim - I'm satisfying my needs (primarily Debian), others are welcome to use this as a basis to build on for their needs, but I'm being paid to support everyone's particular preferences.




Are you human? (reduces spam)
Note: Identity details will be stored in a cookie. Posts may not appear immediately