Page 1 of 2
check_snmp_storage.pl Rollover
Posted: Wed Nov 21, 2018 8:14 am
by Deantwo
Nagios XI v5.5.7
I have a server where I am monitoring a 11TB disk's storage space using SNMP. The values seem to be negative, so I am guessing that it is a rollover issue in the
check_snmp_storage.pl script.
Alarm:
Code: Select all
***** Nasure Monitor Alert *****
Nagios has detected a problem with this service.
Notification Type: PROBLEM
Service: Drive D: Disk Usage
Host: 10.0.0.35
Alias: 10.0.0.35
Address: 10.0.0.35
State: CRITICAL
Info:
D:\ Label:offsite Serial Number ABCDEFGH: 107%used(-8065269MB/-7558275MB) (99%) : CRITICAL
Date/Time: 2018-11-21 13:19:02
Service:
Code: Select all
$USER1$/check_snmp_storage.pl -H $HOSTADDRESS$ $ARG1$
ARG1: -C public --v2c -m ^D: -w 94 -c 99 -f
Command:
Code: Select all
$USER1$/check_snmp_storage.pl -H $HOSTADDRESS$ $ARG1$
I don't see anything in the script's description to mitigate this. I have attempted to add "-G" to make it count in gigabytes, but this does not fix the rollover.
Can I like just edit it to a UINT instead? But would like to ask for an official fix before I start editing the script manually.
Re: check_snmp_storage.pl Rollover
Posted: Wed Nov 21, 2018 10:53 am
by ssax
Please post the output of the command but add a -v onto the end of it to show verbose output.
That will give us the OIDs and then we can poll them directly on the device to see if the device is serving negative values.
Thank you
Re: check_snmp_storage.pl Rollover
Posted: Thu Nov 22, 2018 3:02 am
by Deantwo
ssax wrote:Please post the output of the command but add a -v onto the end of it to show verbose output.
Code: Select all
[root@localhost libexec]# ./check_snmp_storage.pl --version
check_snmp_storage version : 1.3.3
[root@localhost libexec]# ./check_snmp_storage.pl -H 10.0.0.35 -C public --v2c -m ^D: -w 94 -c 99 -f -v
Alarm at 15
SNMP v2c login
Filter : ^D:
OID : 1.3.6.1.2.1.25.2.3.1.3.3, Desc : E:\
OID : 1.3.6.1.2.1.25.2.3.1.3.5, Desc : Physical Memory
OID : 1.3.6.1.2.1.25.2.3.1.3.2, Desc : D:\ Label:offsite Serial Number 321920d1
Name : D:\ Label:offsite Serial Number 321920d1, Index : 2
OID : 1.3.6.1.2.1.25.2.3.1.3.4, Desc : Virtual Memory
OID : 1.3.6.1.2.1.25.2.3.1.3.1, Desc : C:\ Label: Serial Number 52618787
storages selected : 1
1.3.6.1.2.1.25.2.3.1.6.2 : -2038620454
1.3.6.1.2.1.25.2.3.1.5.2 : -1410368257
1.3.6.1.2.1.25.2.3.1.4.2 : 4096
Descr : D:\ Label:offsite Serial Number 321920d1
Size : -1410368257
Used : -2038620454
Alloc : 4096
Perf data : 'D:\_Label:offsite__Serial_Number_321920d1'=-7963361MB;-5178696;-5454158;0;-5509251
D:\ Label:offsite Serial Number 321920d1: 145%used(-7963361MB/-5509251MB) (>99%) : CRITICAL | 'D:\_Label:offsite__Serial_Number_321920d1'=-7963361MB;-5178696;-5454158;0;-5509251
I searched a little after posting here, and it seem to be SNMP that is rolling over into negative. I would assume that the script could just convert it into an unsigned number (int32 to uint32), but I am no SNMP or Perl expert.
After posting I redefined my searching a little and found a similar issue.
But don't know if anything has changed since than.
Re: check_snmp_storage.pl Rollover
Posted: Mon Nov 26, 2018 10:42 am
by lmiltchev
There is an updated plugin that may work for for you. It has some changes added to it to take account for systems that are larger that 2TB.
https://github.com/dnsmichi/manubulon-s ... storage.pl
Edit the /etc/snmp/snmpd.conf file and set:
save the file, restart the snmpd daemon and see if the plugin reports the correct value.
Re: check_snmp_storage.pl Rollover
Posted: Tue Nov 27, 2018 2:54 am
by Deantwo
Yup, found the change.
(here)
Code: Select all
if (version->parse(Net::SNMP->VERSION) >= 4) {
foreach my $key (sort keys %$result) {
# Fix for filesystems larger 2 TB. More than 2 TB will cause an error because
# as defined in the RFC hrStorageSize is a 32 bit integer. So filesystems
# larger 2 TB report a negative value because the first bit will be interpreted
# as an algebraic sign. (0 = +, all others will be -). You simply have to add
# 2 to the power of 32 (4294967296) and it is fixed.
# Martin Fuerstenau, Oce Printing Systems, 25th Sept 2012
if ($$result{$key} < 0) {
$$result{$key} = $$result{$key} + 4294967296;
}
verb("$key x $$result{$key}");
}
}
Yup it is a simple fix like I predicted.
Any chance you'll get this updated in a Nagios XI update?
I am not sure what you mean by editing the
realStorageUnits 0, what is this for?
Re: check_snmp_storage.pl Rollover
Posted: Tue Nov 27, 2018 9:49 am
by mcapra
Deantwo wrote:I am not sure what you mean by editing the realStorageUnits 0, what is this for?
Search for "realStorageUnits":
https://access.redhat.com/documentation ... s/net-snmp
Related errata:
https://bugzilla.redhat.com/show_bug.cgi?id=654384
Re: check_snmp_storage.pl Rollover
Posted: Tue Nov 27, 2018 9:55 am
by lmiltchev
Any chance you'll get this updated in a Nagios XI update?
Yes, it's possible. Our developers are looking into this as we speak but it may take a while before any changes are made, and the plugin is thoroughly tested.
I am not sure what you mean by editing the realStorageUnits 0, what is this for?
Here's the description of this option:
realStorageUnits
controlls how the agent reports hrStorageAllocationUnits, hrStorageSize and hrStorageUsed in hrStorageTable. With this option set to '0', the agent re-calculates these values for big storage drives with small allocation units so hrStorageAllocationUnits x hrStorageSize gives real size of the storage.
Example:
Linux xfs 16TB filesystem with 4096 bytes large blocks will be reported as hrStorageAllocationUnits = 8192 and hrStorageSize = 2147483647, so 8192 x 2147483647 gives real size of the filesystem (=16 TB).
Setting this directive to '1' (=default) turns off this calculation and the agent reports real hrStorageAllocationUnits, but it might report wrong hrStorageSize for big drives because the value won't fit into Integer32. In this case, hrStorageAllocationUnits x hrStorageSize won't give real size of the storage.
https://linux.die.net/man/5/snmpd.conf
Re: check_snmp_storage.pl Rollover
Posted: Wed Nov 28, 2018 4:37 am
by Deantwo
lmiltchev wrote:Deantwo wrote:I am not sure what you mean by editing the realStorageUnits 0, what is this for?
Here's the description of this option:
...
One thing I still don't understand about this option you are talking about. Is this an option that I need to set on the Nagios XI's server or the server it is connecting to via SNMP?
Re: check_snmp_storage.pl Rollover
Posted: Wed Nov 28, 2018 10:46 am
by lmiltchev
One thing I still don't understand about this option you are talking about. Is this an option that I need to set on the Nagios XI's server or the server it is connecting to via SNMP?
It has to set on the remote server, however if your remote server is a Windows box, this option most likely won't work as expected. For more info, see this:
https://serverfault.com/questions/43234 ... on-windows
Re: check_snmp_storage.pl Rollover
Posted: Thu Nov 29, 2018 2:28 am
by Deantwo
lmiltchev wrote:One thing I still don't understand about this option you are talking about. Is this an option that I need to set on the Nagios XI's server or the server it is connecting to via SNMP?
It has to set on the remote server, however if your remote server is a Windows box, this option most likely won't work as expected. For more info, see this:
https://serverfault.com/questions/43234 ... on-windows
It is indeed a windows server, which is why I was confused about it. It didn't seem like it would make sense to change it to the Nagios XI server, but you hadn't clearly said where to do it.
I'll forward that to my server guy and see how that works first, then I'll try to update the plugin if the first first doesn't help.
I'll report back when done.