I hate when you're smarter than me! I'm pretty confident this was my issue. I got the open files increased to 4096 across the board. Still trying to increase the noproc. What I pasted was as root, nagios is only set to 1024. Trying to get it to use the new values without having to reboot.
Thanks
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
For anyone else that may find this thread, nproc is also set in /etc/security/limits.d/90-nproc.conf on RHEL 6.x
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
You've got my +1 for #2 ! If the Hostcheck is configured correctly (checking for ICMP/HTTP/NRPE,etc) then a Host Down indicates that it can't be contacted. If the service checks can then be suppressed that will will reduce the load on the Nagios Server and the resulting notification storm for the multiple services for each host
Fred Kroeger wrote:and the resulting notification storm for the multiple services for each host
It won't reduce the additional load (presuming you mean from service check timeouts) but it's fairly trivial to change your notification command so that it doesn't notify if the host is down.
What trivial change do I make to the notification command to stop service alerts when a host is down?
Unfortunately this doesn't help for those of us who use event handlers. Which is why it would more efficient to suppress the Service in the first place.
If host/service checks are configured well, nagios should not alert you if the host is down. Sometimes, this may not work as intended, depending on the check & retry intervals, max check attempts, etc. If the service is checked first (before finding out that the host is down), you will receive a service notification. Some users have tried using custom wrapper scripts that would ping the host and execute the service check if the host is up.
Be sure to check out our Knowledgebase for helpful articles and solutions!
The problem for us is that because the service checks are still active for a Host that is Down, the event handler is still sending out all the service events.
So we have to start creating these wrapper scripts to work around this issue. If a Host check is configured correctly and is showing down, then it logically follows that all the service checks will also fail.
The option to make service checks dependent on a Host being Up will alleviate the need for a lot of us to create custom wrapper scripts.
Fred Kroeger wrote:The problem for us is that because the service checks are still active for a Host that is Down, the event handler is still sending out all the service events.
So we have to start creating these wrapper scripts to work around this issue. If a Host check is configured correctly and is showing down, then it logically follows that all the service checks will also fail.
The option to make service checks dependent on a Host being Up will alleviate the need for a lot of us to create custom wrapper scripts.
regards... Fred
Couldn't have said it better myself.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github