Server issues when multiple hosts were down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

tmcdonald wrote:@Bandit - Your max user processes seems well high enough, but your open files might need to be doubled:

http://stackoverflow.com/questions/3458 ... t-in-linux
I hate when you're smarter than me! I'm pretty confident this was my issue. I got the open files increased to 4096 across the board. Still trying to increase the noproc. What I pasted was as root, nagios is only set to 1024. Trying to get it to use the new values without having to reboot.

Thanks
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Server issues when multiple hosts were down

Post by tmcdonald »

BanditBBS wrote:I hate when you're smarter than me!
I am going to have so many T-shirts this year!

And it's not that I was smarter, it's that I hit my head against the issue at a different angle and happened to have a solution-shaped bruise.

Lemme know how it goes. Usually /etc/security/limits.conf is what you mod to make it permanent.
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

For anyone else that may find this thread, nproc is also set in /etc/security/limits.d/90-nproc.conf on RHEL 6.x
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Server issues when multiple hosts were down

Post by abrist »

BanditBBS wrote: Still trying to increase the noproc.
Check:

Code: Select all

/etc/security/limits.d/90-nproc.conf
(as well as limits.conf)
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Server issues when multiple hosts were down

Post by Fred Kroeger »

You've got my +1 for #2 ! If the Hostcheck is configured correctly (checking for ICMP/HTTP/NRPE,etc) then a Host Down indicates that it can't be contacted. If the service checks can then be suppressed that will will reduce the load on the Nagios Server and the resulting notification storm for the multiple services for each host

Regards... Fred
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Server issues when multiple hosts were down

Post by jdalrymple »

Fred Kroeger wrote:and the resulting notification storm for the multiple services for each host
It won't reduce the additional load (presuming you mean from service check timeouts) but it's fairly trivial to change your notification command so that it doesn't notify if the host is down.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Server issues when multiple hosts were down

Post by Fred Kroeger »

What trivial change do I make to the notification command to stop service alerts when a host is down?

Unfortunately this doesn't help for those of us who use event handlers. Which is why it would more efficient to suppress the Service in the first place.

regards... Fred
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Server issues when multiple hosts were down

Post by lmiltchev »

If host/service checks are configured well, nagios should not alert you if the host is down. Sometimes, this may not work as intended, depending on the check & retry intervals, max check attempts, etc. If the service is checked first (before finding out that the host is down), you will receive a service notification. Some users have tried using custom wrapper scripts that would ping the host and execute the service check if the host is up.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Server issues when multiple hosts were down

Post by Fred Kroeger »

The problem for us is that because the service checks are still active for a Host that is Down, the event handler is still sending out all the service events.
So we have to start creating these wrapper scripts to work around this issue. If a Host check is configured correctly and is showing down, then it logically follows that all the service checks will also fail.
The option to make service checks dependent on a Host being Up will alleviate the need for a lot of us to create custom wrapper scripts.

regards... Fred
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

Fred Kroeger wrote:The problem for us is that because the service checks are still active for a Host that is Down, the event handler is still sending out all the service events.
So we have to start creating these wrapper scripts to work around this issue. If a Host check is configured correctly and is showing down, then it logically follows that all the service checks will also fail.
The option to make service checks dependent on a Host being Up will alleviate the need for a lot of us to create custom wrapper scripts.

regards... Fred
Couldn't have said it better myself.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Locked