just one service stopped running
just one service stopped running
We had an issue today where just one service stopped running. It's timeperiod is 24x7. And according to the logs it was running just fine (and it's been running for months) and then all of a sudden just stopped. The other services were just fine. But what was weird is that it had a time for the next scheduled check and that time would update, but the last checked time did not update (and the service output includes a time stamp, so I know it was not running). I restarted nagios and things are fine now, but I'm just wondering is this something that is common? Is there something I can do to debug this issue if it happens again?
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: just one service stopped running
Now that it's behind you it's going to be very hard to debug.
It is not common behavior. There are only 2 (standard) ways to affect the running behavior of Nagios, reload with a modified configuration or send direct commands through the cgis. Are you sure that nobody hit the "Disable active checks" button?
It is not common behavior. There are only 2 (standard) ways to affect the running behavior of Nagios, reload with a modified configuration or send direct commands through the cgis. Are you sure that nobody hit the "Disable active checks" button?
- Attachments
-
- Clipboard02.jpg (25.79 KiB) Viewed 1980 times
Re: just one service stopped running
I remember seeing a red x for that option, so I'm going to say no one hit that button.
As far as config changes, the only config change I made today was to turn on performance monitoring. I patched for bug 534 yesterday.
As far as config changes, the only config change I made today was to turn on performance monitoring. I patched for bug 534 yesterday.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: just one service stopped running
Unfortunately then it sounds like there will be no way to debug this problem, that is unless you currently have debugging turned on in nagios.cfg. The nagios.log is unlikely to indicate to us why monitoring for that service would have stopped.
The best we can do is watch to see if the problem recurs.
The best we can do is watch to see if the problem recurs.