Server issues when multiple hosts were down

Post by **BanditBBS** » Tue May 19, 2015 3:01 pm

tmcdonald wrote:@Bandit - Your max user processes seems well high enough, but your open files might need to be doubled:

http://stackoverflow.com/questions/3458 ... t-in-linux

I hate when you're smarter than me! I'm pretty confident this was my issue. I got the open files increased to 4096 across the board. Still trying to increase the noproc. What I pasted was as root, nagios is only set to 1024. Trying to get it to use the new values without having to reboot.

Thanks

tmcdonald · Post by **tmcdonald** » Tue May 19, 2015 3:09 pm

BanditBBS wrote:I hate when you're smarter than me!

I am going to have so many T-shirts this year!

And it's not that I was smarter, it's that I hit my head against the issue at a different angle and happened to have a solution-shaped bruise.

Lemme know how it goes. Usually /etc/security/limits.conf is what you mod to make it permanent.

Post by **BanditBBS** » Tue May 19, 2015 3:12 pm

For anyone else that may find this thread, nproc is also set in /etc/security/limits.d/90-nproc.conf on RHEL 6.x

abrist · Post by **abrist** » Tue May 19, 2015 3:13 pm

BanditBBS wrote: Still trying to increase the noproc.

Check:

Code: Select all

/etc/security/limits.d/90-nproc.conf

(as well as limits.conf)

Fred Kroeger · Post by **Fred Kroeger** » Tue May 19, 2015 6:59 pm

You've got my +1 for #2 ! If the Hostcheck is configured correctly (checking for ICMP/HTTP/NRPE,etc) then a Host Down indicates that it can't be contacted. If the service checks can then be suppressed that will will reduce the load on the Nagios Server and the resulting notification storm for the multiple services for each host

Regards... Fred

jdalrymple · Post by **jdalrymple** » Wed May 20, 2015 9:43 am

Fred Kroeger wrote:and the resulting notification storm for the multiple services for each host

It won't reduce the additional load (presuming you mean from service check timeouts) but it's fairly trivial to change your notification command so that it doesn't notify if the host is down.

Fred Kroeger · Post by **Fred Kroeger** » Wed May 20, 2015 7:49 pm

What trivial change do I make to the notification command to stop service alerts when a host is down?

Unfortunately this doesn't help for those of us who use event handlers. Which is why it would more efficient to suppress the Service in the first place.

regards... Fred

Post by **lmiltchev** » Thu May 21, 2015 3:20 pm

If host/service checks are configured well, nagios should not alert you if the host is down. Sometimes, this may not work as intended, depending on the check & retry intervals, max check attempts, etc. If the service is checked first (before finding out that the host is down), you will receive a service notification. Some users have tried using custom wrapper scripts that would ping the host and execute the service check if the host is up.

Fred Kroeger · Post by **Fred Kroeger** » Wed May 27, 2015 7:19 pm

The problem for us is that because the service checks are still active for a Host that is Down, the event handler is still sending out all the service events.
So we have to start creating these wrapper scripts to work around this issue. If a Host check is configured correctly and is showing down, then it logically follows that all the service checks will also fail.
The option to make service checks dependent on a Host being Up will alleviate the need for a lot of us to create custom wrapper scripts.

regards... Fred

Post by **BanditBBS** » Wed May 27, 2015 7:22 pm

Fred Kroeger wrote:The problem for us is that because the service checks are still active for a Host that is Down, the event handler is still sending out all the service events.
So we have to start creating these wrapper scripts to work around this issue. If a Host check is configured correctly and is showing down, then it logically follows that all the service checks will also fail.
The option to make service checks dependent on a Host being Up will alleviate the need for a lot of us to create custom wrapper scripts.

regards... Fred

Couldn't have said it better myself.

Nagios Support Forum

Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down

Re: Server issues when multiple hosts were down