Page 1 of 2
check_disk falsely reporting 0% free Nagios Users
Posted: Tue Jan 21, 2014 11:24 am
by salderman1
All,
I have a strange scenario on a Solaris 10 server using ZFS. The server has several ZFS Pools configured, but only one of them is larger than 1TB. We've been monitoring this server for about a week with out issue. This morning check_disk reports the 2TB filesystem as 0% available and is throwing false alarms. In df, the filesystem is only 39% used.
We are using the following command:
Code: Select all
# /opt/csw/libexec/nagios-plugins/check_disk -w 20% -c 10% -R "^/$|oracle$|^/u[0-9]+" -X tmpfs
DISK CRITICAL - free space: / 196670 MB (96% inode=99%); /oracle 954185 MB (76% inode=99%); /u01 92383 MB (92% inode=99%); /u02 199323 MB (49% inode=99%); /u04 11459 MB (57% inode=99%); /u05 133218 MB (44% inode=99%); /u06 0 MB (0% inode=99%); /u10 69086 MB (68% inode=99%);| /=6501MB;162537;182854;0;203172 /oracle=295166MB;999480;1124415;0;1249351 /u01=7910MB;80235;90264;0;100294 /u02=201819MB;320914;361028;0;401143 /u04=8573MB;16025;18028;0;20032 /u05=167159MB;240301;270339;0;300377 /u06=787725MB;1638601;1843426;0;2048252 /u10=31193MB;80224;90252;0;100280
# df -h /u06
Filesystem size used avail capacity Mounted on
zpool06/u06 2.0T 769G 1.2T 39% /u06
# /opt/csw/libexec/nagios-plugins/check_disk -vvv -u bytes /u06
calling stat on /u06
For /u06, used_pct=100 free_pct=0 used_units=8.2599e+11 free_units=0 total_units=2.14775e+12 used_inodes_pct=1 free_inodes_pct=99 fsp.fsu_blocksize=512 mult=1
Freespace_units result=0
Freespace% result=0
Usedspace_units result=0
Usedspace_percent result=0
Usedinodes_percent result=0
Freeinodes_percent result=0
DISK OK - free space: /u06 0 B (0% inode=99%);| /u06=2147483647B;;;0;2147483647
Seems like some kind of math problem, when we have 2.14^12 bytes capacity, with 8.26^11 bytes used. My desktop calculator reports that math to be 38.5%
Any thoughts? Thanks a bunch!
Re: check_disk falsely reporting 0% free Nagios Users
Posted: Tue Jan 21, 2014 5:36 pm
by sreinhardt
Interesting, this looks like the performance data is outputted correctly, but the standard message is not. I will take a look into this tonight where I have several large disks to test against. If it does turn out to be a bug, I may ask that you submit a bug post on the github.com/nagios-plugins page, however I would like to do some testing first.
/u06 0 MB (0% inode=99%) | u06=787725MB;1638601;1843426;0;2048252
Re: check_disk falsely reporting 0% free Nagios Users
Posted: Wed Jan 22, 2014 9:01 am
by salderman1
sreinhardt wrote:Interesting, this looks like the performance data is outputted correctly, but the standard message is not. I will take a look into this tonight where I have several large disks to test against. If it does turn out to be a bug, I may ask that you submit a bug post on the github.com/nagios-plugins page, however I would like to do some testing first.
/u06 0 MB (0% inode=99%) | u06=787725MB;1638601;1843426;0;2048252
Excellent, I'd be happy to. FWIW, I failed to mention the plugins version...
Code: Select all
# pkginfo -l CSWnagios-plugins
PKGINST: CSWnagios-plugins
NAME: nagios_plugins - Plugins for Nagios
CATEGORY: application
ARCH: sparc
VERSION: 1.4.16,REV=2012.07.18
BASEDIR: /
VENDOR: http://downloads.sourceforge.net/nagiosplug/ packaged for CSW by Juergen Arndt
PSTAMP: ja@unstable10s-20120718235350
INSTDATE: Jan 16 2014 14:21
HOTLINE: http://www.opencsw.org/bugtrack/
EMAIL: [email protected]
STATUS: completely installed
FILES: 86 installed pathnames
1 shared pathnames
5 directories
55 executables
3 setuid/setgid executables
12082 blocks used (approx)
Re: check_disk falsely reporting 0% free Nagios Users
Posted: Wed Jan 22, 2014 4:26 pm
by sreinhardt
OH, 1.4.2, I think this is already patched. I created a virtual 4TB disk last night and ran tests against, was well as a physical 2TB disk and did not have issues with the present 1.5 code. I would highly suggest updating. Myself and abrist have made many improvements in the maint branch in the last few days, as well as the master branch should contain the fix you are looking for presently.
http://github.com/nagios-plugins
Re: check_disk falsely reporting 0% free Nagios Users
Posted: Thu Jan 23, 2014 10:35 am
by salderman1
Unfortunately, it would appear that OpenCSW does not provide the version you are referring to. I am running 1.4.16, not 1.4.2 -
Code: Select all
# /opt/csw/libexec/nagios-plugins/check_disk -V
check_disk v1.4.16 (nagios-plugins 1.4.16)
- which seems to be the latest version of nagios_plugins in all of the package trees available through OpenCSW. Can you tell me what version this issue would have been resolved in? It seems like the check_disk.c code for 1.4.16 on github hasn't been touched in ~3 years, the OpenCSW package was built in July of 2012.
It looks like this
fix for blocksize differences might be a possible, but it seems to be tagged in both 1.5 and 1.4.16. We are running with several Zpools that have a mix of blocksizes. The ZFS default is 128K, and the filesystem in this issue has that size, but there are other ZFS filesystems on the server which have 8K blocksizes for Oracle DB files.
Thanks for looking into this for me.
Re: check_disk falsely reporting 0% free Nagios Users
Posted: Thu Jan 23, 2014 3:54 pm
by sreinhardt
My mistake on the version, now I'm not sure where I got 1.4.2 from... Anyway, the
news file states that it should have been resolved with the changes to plugins/check_disk.c and lib/disk_utils.c with 1.4.16, which should indicate that you have it. Apparently this does not seem to be the case. I will have to setup a opensolaris or opencsw system with some virtual disks in zpools to test it tonight. Also just a note, any changes relevant to this, likely would be in disk_utils.c not check_disk.c, as the former holds most of the functions for doing the calculations we need.(at least from the brief look I gave it)
Re: check_disk falsely reporting 0% free Nagios Users
Posted: Thu Jan 23, 2014 4:34 pm
by salderman1
sreinhardt wrote:My mistake on the version, now I'm not sure where I got 1.4.2 from... Anyway, the
news file states that it should have been resolved with the changes to plugins/check_disk.c and lib/disk_utils.c with 1.4.16, which should indicate that you have it. Apparently this does not seem to be the case. I will have to setup a opensolaris or opencsw system with some virtual disks in zpools to test it tonight. Also just a note, any changes relevant to this, likely would be in disk_utils.c not check_disk.c, as the former holds most of the functions for doing the calculations we need.(at least from the brief look I gave it)
Thanks, and thanks for the correction. It's only been twenty years since I studied Kernighan and Ritche in college

Please forgive me.
Re: check_disk falsely reporting 0% free Nagios Users
Posted: Thu Jan 23, 2014 5:16 pm
by sreinhardt
HAHA, no worries, good old K&R still nothing quite like it.
Re: check_disk falsely reporting 0% free Nagios Users
Posted: Thu Jan 30, 2014 1:16 pm
by salderman1
sreinhardt wrote:I will have to setup a opensolaris or opencsw system with some virtual disks in zpools to test it tonight.
Hi, I don't mean to be a pest, but I was curious if you had any luck on your testing.
Thanks!
Re: check_disk falsely reporting 0% free Nagios Users
Posted: Thu Jan 30, 2014 4:37 pm
by sreinhardt
Yes I did have a chance to test some aspects of it, got a zfs pool created with ~3tb space, but was unable to replicate the same issue. I was testing on linux not opencsw though, and am thinking that must be the difference. Once I finish getting that setup I will post back again, feel free to bug me all you want, its a good reminder.
