Hi
i have a weird issue with nagios XI, like a false positive.
i wrote a small check with NRPE that verify the accessibility of a mountpoint.
i had to write a script because the mountpoints i'm checking are autofs managed, and not always mounted, and the nagios mountpoint wizard fails to monitor them correctly(even with the write test), it always fails.
only problem is, it works most of the time, but i always have around 50-70 checks that fails (and succeed if i retry manually or next time nagios checks them again)
this is creating A LOT of mails notification, making notifications useless.
what do i do wrong ? the nagios server is pretty strong (2 E5450 3GHz, 16GB ram, raid1 SAS enterprise disks) and i tried to change some of the values in nagios.cfg that i found on this forum, nothing seems to change, the event queue is like stuck at 200 (the small dashlet graph)
i also verified that my ntp server is accurate.
any help would be appreciated.
thanks
service fails but succeed if retried manually
Re: service fails but succeed if retried manually
Can you send us (via PM or post attachment) a system profile? From the Nagios XI GUI, you can gather a profile via Admin -> System Profile -> Download Profile.
Can you also tell us which host/service is producing these issues?
Can you also tell us which host/service is producing these issues?
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: service fails but succeed if retried manually
Hi
in attachment the profile.
all hosts are producing this, randomly, and it only concerns the autofs mountpoints that i'm checking with a script.
copying the script here, called check_mountpoints :
in attachment the profile.
all hosts are producing this, randomly, and it only concerns the autofs mountpoints that i'm checking with a script.
copying the script here, called check_mountpoints :
Code: Select all
#!/bin/bash
mount=${1}
fail=0
ls ${mount} >/dev/null &
childpid=$!
sleep 0.1
if [ -d "/proc/${childpid}" ]; then
kill -9 $childpid > /dev/null 2>&1
fail=1
fi
if [ $fail -eq 0 ]; then
echo "OK - $mount accessible"
exit 0
fi
if [ $fail -eq 1 ]; then
echo "CRITICAL - $mount unreachable"
exit 2
fi
You do not have the required permissions to view the files attached to this post.
Re: service fails but succeed if retried manually
Lets look at gridcluster31 as an example:
In this case, Nagios XI is simply returning what the plugin produces. If the plugin is incorrectly reporting, there's not much that can be done from the Nagios XI end of things. You would need to alter the plugin.
Code: Select all
[1491371254] SERVICE ALERT: gridcluster31;/homes/swlab;CRITICAL;SOFT;1;CRITICAL - /homes/swlab unreachable
[1491371320] SERVICE ALERT: gridcluster31;/mobileye/shared;CRITICAL;SOFT;1;CRITICAL - /mobileye/shared unreachable
[1491371388] SERVICE ALERT: gridcluster31;/mobileye/mbkrepository;CRITICAL;SOFT;2;CRITICAL - /mobileye/mbkrepository unreachable
[1491371610] SERVICE ALERT: gridcluster31;/homes/swlab;CRITICAL;SOFT;1;CRITICAL - /homes/swlab unreachableFormer Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: service fails but succeed if retried manually
yes but that's where it doesn't really make sense, when i run the plugin from the client it succeeds... and also in nagios if i manually force a recheck it succeeds... also if nagios retries by itself it usually succeeds to...
the plugin is quite simple, ls an autofs directory, and return 0 or 2...
the plugin is quite simple, ls an autofs directory, and return 0 or 2...
Re: service fails but succeed if retried manually
i solved my issue by changing my script like this :
i still think that there's an issue here with nagios but i worked around it.
also please note that nagios can't check autofs mountpoints, because it always find that the mountpoint is not mounted, and even the write test (from nrpe) doesn't help.
thanks, you can close this
Code: Select all
#!/bin/bash
mount=${1}
ls ${mount} > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "OK - $mount accessible"
exit 0
else
echo "CRITICAL - $mount unreachable"
exit 2
fi
i still think that there's an issue here with nagios but i worked around it.
also please note that nagios can't check autofs mountpoints, because it always find that the mountpoint is not mounted, and even the write test (from nrpe) doesn't help.
thanks, you can close this