Server issues when multiple hosts were down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Server issues when multiple hosts were down

Post by vAJ »

Honestly, I think NagiosXI has a hidden function that says, "Hey, if you're going to let half of your systems be down for this long, I'm going to go walk the dog. Bye."
Andrew J. - Do you even grok?
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Server issues when multiple hosts were down

Post by jdalrymple »

The reason that service checks continue by default when hosts are down is simple - it offers greater flexibility in monitoring. To have the default behavior to immediately stop service checks when a host doesn't respond to a ping is more fixed in nature and quite often improper. Note the screenshot - there are 2.5 useful pieces of information there:

1) The Windows firewall is blocking ICMP (default for Windows)
2) My disks are OK
2.5) That computer is PROBABLY doing its job and I can go back to bed

If the default behavior of service checks on a down host was different there would be a different/smaller number of useful pieces of information there:

1) Something is wrong and I need to wake up and fix it.

Should the option to change that behavior be there - probably. Others agree although it's hard for me to say why the developers haven't focused on it yet, maybe it's a massive change? It's no doubt part of the Core code so maybe they're hoping a community member can tackle it while they work on XI stuff? :)
You do not have the required permissions to view the files attached to this post.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

jdalrymple,

That's the same argument the other person made(I think Trevor) months back. My reply to that is, if the stuff is setup correctly then the fw isn't going to all of a sudden block, and if it does, then something changed and it needs fixed. So for me, that would never be an issue or a reason to keep monitoring services....just my opinion, but I think quite a few in here agree.

Regardless, still need my other question resolved, so come on Trevor, scan those logs :) As for the still monitoring issue/feature....I don't know where to go from here with the conversation....
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Server issues when multiple hosts were down

Post by tmcdonald »

BanditBBS wrote:Regardless, still need my other question resolved, so come on Trevor, scan those logs :)
They said something about not being able to fork, so I assume that means you need to do dishes? Or just increase limits a bit:

http://stackoverflow.com/questions/2064 ... le-to-fork

Let's get some ulimit -a action going on.
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

Code: Select all

[root@iss-chi-nag05 ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515266
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 515266
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Server issues when multiple hosts were down

Post by jdalrymple »

BanditBBS wrote:if the stuff is setup correctly
So yeah - if they are set up correctly I agree. That's not always life - especially when you're not using check_icmp for your host check command. Take my example above - stuff obviously isn't set up properly, and with the services still reporting in that fact is illuminated rather than masked by Nagios lying about up services.

Anyways, I'm beating #2 to death. Point is I agree that there should be an option, however I'd never condone changing the default behavior.

I'll leave #1 to Trevor.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

Those of you following this thread for the service check stuff, I had opened a feature request back in January for Core...go +1 it to make your voices heard - http://tracker.nagios.org/view.php?id=666
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Server issues when multiple hosts were down

Post by WillemDH »

+1'd your feature request Bandit. Grtz
Nagios XI 5.8.1
https://outsideit.net
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Server issues when multiple hosts were down

Post by snapon_admin »

Don't have an account on that tracker site and can't seem to get the confirmation email sent to me, if it ever arrives and I can activate my account I'll +1 that as well.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Server issues when multiple hosts were down

Post by tmcdonald »

@Bandit - Your max user processes seems well high enough, but your open files might need to be doubled:

http://stackoverflow.com/questions/3458 ... t-in-linux
Former Nagios employee
Locked