Server issues when multiple hosts were down
Re: Server issues when multiple hosts were down
Honestly, I think NagiosXI has a hidden function that says, "Hey, if you're going to let half of your systems be down for this long, I'm going to go walk the dog. Bye."
Andrew J. - Do you even grok?
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Server issues when multiple hosts were down
The reason that service checks continue by default when hosts are down is simple - it offers greater flexibility in monitoring. To have the default behavior to immediately stop service checks when a host doesn't respond to a ping is more fixed in nature and quite often improper. Note the screenshot - there are 2.5 useful pieces of information there:
1) The Windows firewall is blocking ICMP (default for Windows)
2) My disks are OK
2.5) That computer is PROBABLY doing its job and I can go back to bed
If the default behavior of service checks on a down host was different there would be a different/smaller number of useful pieces of information there:
1) Something is wrong and I need to wake up and fix it.
Should the option to change that behavior be there - probably. Others agree although it's hard for me to say why the developers haven't focused on it yet, maybe it's a massive change? It's no doubt part of the Core code so maybe they're hoping a community member can tackle it while they work on XI stuff?
1) The Windows firewall is blocking ICMP (default for Windows)
2) My disks are OK
2.5) That computer is PROBABLY doing its job and I can go back to bed
If the default behavior of service checks on a down host was different there would be a different/smaller number of useful pieces of information there:
1) Something is wrong and I need to wake up and fix it.
Should the option to change that behavior be there - probably. Others agree although it's hard for me to say why the developers haven't focused on it yet, maybe it's a massive change? It's no doubt part of the Core code so maybe they're hoping a community member can tackle it while they work on XI stuff?
You do not have the required permissions to view the files attached to this post.
Re: Server issues when multiple hosts were down
jdalrymple,
That's the same argument the other person made(I think Trevor) months back. My reply to that is, if the stuff is setup correctly then the fw isn't going to all of a sudden block, and if it does, then something changed and it needs fixed. So for me, that would never be an issue or a reason to keep monitoring services....just my opinion, but I think quite a few in here agree.
Regardless, still need my other question resolved, so come on Trevor, scan those logs
As for the still monitoring issue/feature....I don't know where to go from here with the conversation....
That's the same argument the other person made(I think Trevor) months back. My reply to that is, if the stuff is setup correctly then the fw isn't going to all of a sudden block, and if it does, then something changed and it needs fixed. So for me, that would never be an issue or a reason to keep monitoring services....just my opinion, but I think quite a few in here agree.
Regardless, still need my other question resolved, so come on Trevor, scan those logs
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Server issues when multiple hosts were down
They said something about not being able to fork, so I assume that means you need to do dishes? Or just increase limits a bit:BanditBBS wrote:Regardless, still need my other question resolved, so come on Trevor, scan those logs :)
http://stackoverflow.com/questions/2064 ... le-to-fork
Let's get some ulimit -a action going on.
Former Nagios employee
Re: Server issues when multiple hosts were down
Code: Select all
[root@iss-chi-nag05 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 515266
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 515266
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Server issues when multiple hosts were down
So yeah - if they are set up correctly I agree. That's not always life - especially when you're not using check_icmp for your host check command. Take my example above - stuff obviously isn't set up properly, and with the services still reporting in that fact is illuminated rather than masked by Nagios lying about up services.BanditBBS wrote:if the stuff is setup correctly
Anyways, I'm beating #2 to death. Point is I agree that there should be an option, however I'd never condone changing the default behavior.
I'll leave #1 to Trevor.
Re: Server issues when multiple hosts were down
Those of you following this thread for the service check stuff, I had opened a feature request back in January for Core...go +1 it to make your voices heard - http://tracker.nagios.org/view.php?id=666
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Server issues when multiple hosts were down
+1'd your feature request Bandit. Grtz
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: Server issues when multiple hosts were down
Don't have an account on that tracker site and can't seem to get the confirmation email sent to me, if it ever arrives and I can activate my account I'll +1 that as well.
Re: Server issues when multiple hosts were down
@Bandit - Your max user processes seems well high enough, but your open files might need to be doubled:
http://stackoverflow.com/questions/3458 ... t-in-linux
http://stackoverflow.com/questions/3458 ... t-in-linux
Former Nagios employee