Page 2 of 8
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 9:16 am
by vAJ
Honestly, I think NagiosXI has a hidden function that says, "Hey, if you're going to let half of your systems be down for this long, I'm going to go walk the dog. Bye."
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 9:45 am
by jdalrymple
The reason that service checks continue by default when hosts are down is simple - it offers greater flexibility in monitoring. To have the default behavior to immediately stop service checks when a host doesn't respond to a ping is more fixed in nature and quite often improper. Note the screenshot - there are 2.5 useful pieces of information there:
1) The Windows firewall is blocking ICMP (default for Windows)
2) My disks are OK
2.5) That computer is PROBABLY doing its job and I can go back to bed
If the default behavior of service checks on a down host was different there would be a different/smaller number of useful pieces of information there:
1) Something is wrong and I need to wake up and fix it.
Should the option to change that behavior be there - probably.
Others agree although it's hard for me to say why the developers haven't focused on it yet, maybe it's a massive change? It's no doubt part of the Core code so maybe they're hoping a community member can tackle it while they work on XI stuff?

Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 9:49 am
by BanditBBS
jdalrymple,
That's the same argument the other person made(I think Trevor) months back. My reply to that is, if the stuff is setup correctly then the fw isn't going to all of a sudden block, and if it does, then something changed and it needs fixed. So for me, that would never be an issue or a reason to keep monitoring services....just my opinion, but I think quite a few in here agree.
Regardless, still need my other question resolved, so come on Trevor, scan those logs

As for the still monitoring issue/feature....I don't know where to go from here with the conversation....
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 10:04 am
by tmcdonald
BanditBBS wrote:Regardless, still need my other question resolved, so come on Trevor, scan those logs :)
They said something about not being able to fork, so I assume that means you need to do dishes? Or just increase limits a bit:
http://stackoverflow.com/questions/2064 ... le-to-fork
Let's get some ulimit -a action going on.
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 10:12 am
by BanditBBS
Code: Select all
[root@iss-chi-nag05 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 515266
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 515266
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 10:19 am
by jdalrymple
BanditBBS wrote:if the stuff is setup correctly
So yeah - if they are set up correctly I agree. That's not always life - especially when you're not using check_icmp for your host check command. Take my example above - stuff obviously isn't set up properly, and with the services still reporting in that fact is illuminated rather than masked by Nagios lying about up services.
Anyways, I'm beating #2 to death. Point is I agree that there should be an option, however I'd never condone changing the default behavior.
I'll leave #1 to Trevor.
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 10:39 am
by BanditBBS
Those of you following this thread for the service check stuff, I had opened a feature request back in January for Core...go +1 it to make your voices heard -
http://tracker.nagios.org/view.php?id=666
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 11:55 am
by WillemDH
+1'd your feature request Bandit. Grtz
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 2:08 pm
by snapon_admin
Don't have an account on that tracker site and can't seem to get the confirmation email sent to me, if it ever arrives and I can activate my account I'll +1 that as well.
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 2:11 pm
by tmcdonald
@Bandit - Your max user processes seems well high enough, but your open files might need to be doubled:
http://stackoverflow.com/questions/3458 ... t-in-linux