I see two realistic solutions to this issue, off the top of my head.
1) Add freshness checks to some or all of your passive services, with a timeout well past what their normal interval is. Do something like check_dummy!2!"No passive results returned in 1 hour! Check xinetd."!!! This way you always get a critical and it is informative about what is failing.
2) A second option that is similar but might provide a bit better way to handle it. Have check that submits a passive check to nsca (presuming that is your choice for passive results) have the script then sleep for 10-30 seconds and when it wakes up, check the nagios service via webui, json api, etc, and return OK only if the passive check was received. If it was not, you could use this option to kick off a local event handler and restart the xinetd service and resolve the issue immediately.
NSCA stops working sometimes
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: NSCA stops working sometimes
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.