NSCA stops working sometimes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: NSCA stops working sometimes

Post by sreinhardt »

I see two realistic solutions to this issue, off the top of my head.

1) Add freshness checks to some or all of your passive services, with a timeout well past what their normal interval is. Do something like check_dummy!2!"No passive results returned in 1 hour! Check xinetd."!!! This way you always get a critical and it is informative about what is failing.

2) A second option that is similar but might provide a bit better way to handle it. Have check that submits a passive check to nsca (presuming that is your choice for passive results) have the script then sleep for 10-30 seconds and when it wakes up, check the nagios service via webui, json api, etc, and return OK only if the passive check was received. If it was not, you could use this option to kick off a local event handler and restart the xinetd service and resolve the issue immediately.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked