Page 3 of 8
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 3:01 pm
by BanditBBS
I hate when you're smarter than me! I'm pretty confident this was my issue. I got the open files increased to 4096 across the board. Still trying to increase the noproc. What I pasted was as root, nagios is only set to 1024. Trying to get it to use the new values without having to reboot.
Thanks
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 3:09 pm
by tmcdonald
BanditBBS wrote:I hate when you're smarter than me!
I am going to have so many T-shirts this year!
And it's not that I was smarter, it's that I hit my head against the issue at a different angle and happened to have a solution-shaped bruise.
Lemme know how it goes. Usually /etc/security/limits.conf is what you mod to make it permanent.
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 3:12 pm
by BanditBBS
For anyone else that may find this thread, nproc is also set in /etc/security/limits.d/90-nproc.conf on RHEL 6.x
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 3:13 pm
by abrist
BanditBBS wrote: Still trying to increase the noproc.
Check:
Code: Select all
/etc/security/limits.d/90-nproc.conf
(as well as limits.conf)
Re: Server issues when multiple hosts were down
Posted: Tue May 19, 2015 6:59 pm
by Fred Kroeger
You've got my +1 for #2 ! If the Hostcheck is configured correctly (checking for ICMP/HTTP/NRPE,etc) then a Host Down indicates that it can't be contacted. If the service checks can then be suppressed that will will reduce the load on the Nagios Server and the resulting notification storm for the multiple services for each host
Regards... Fred
Re: Server issues when multiple hosts were down
Posted: Wed May 20, 2015 9:43 am
by jdalrymple
Fred Kroeger wrote:and the resulting notification storm for the multiple services for each host
It won't reduce the additional load (presuming you mean from service check timeouts) but it's fairly trivial to change your notification command so that it doesn't notify if the host is down.
Re: Server issues when multiple hosts were down
Posted: Wed May 20, 2015 7:49 pm
by Fred Kroeger
What trivial change do I make to the notification command to stop service alerts when a host is down?
Unfortunately this doesn't help for those of us who use event handlers. Which is why it would more efficient to suppress the Service in the first place.
regards... Fred
Re: Server issues when multiple hosts were down
Posted: Thu May 21, 2015 3:20 pm
by lmiltchev
If host/service checks are configured well, nagios should not alert you if the host is down. Sometimes, this may not work as intended, depending on the check & retry intervals, max check attempts, etc. If the service is checked first (before finding out that the host is down), you will receive a service notification. Some users have tried using custom wrapper scripts that would ping the host and execute the service check if the host is up.
Re: Server issues when multiple hosts were down
Posted: Wed May 27, 2015 7:19 pm
by Fred Kroeger
The problem for us is that because the service checks are still active for a Host that is Down, the event handler is still sending out all the service events.
So we have to start creating these wrapper scripts to work around this issue. If a Host check is configured correctly and is showing down, then it logically follows that all the service checks will also fail.
The option to make service checks dependent on a Host being Up will alleviate the need for a lot of us to create custom wrapper scripts.
regards... Fred
Re: Server issues when multiple hosts were down
Posted: Wed May 27, 2015 7:22 pm
by BanditBBS
Fred Kroeger wrote:The problem for us is that because the service checks are still active for a Host that is Down, the event handler is still sending out all the service events.
So we have to start creating these wrapper scripts to work around this issue. If a Host check is configured correctly and is showing down, then it logically follows that all the service checks will also fail.
The option to make service checks dependent on a Host being Up will alleviate the need for a lot of us to create custom wrapper scripts.
regards... Fred
Couldn't have said it better myself.