Nagios Service Stops Unexpectedly
Posted: Mon Feb 25, 2013 7:16 pm
We have a Nagios/pnp4nagios installation several years old. It has been periodically updated to recent versions. Suddenly the nagios service stopped for no apparent resaon with no obvious errors in any of the logs I've gone through. When restarted it starts checking hosts and fails at the same point after one specific host check and before the next. The only outward sign is noticing the last check times are old. When attempting to manually schedule a host check from the interface the it does nothing when the commit button is pressed, where normally it would confirm the check was scheduled and have a "done" link. This is presumably because the service is stopped and the interface gets no response. I have gone through the configuration files to no avail, specifically checking the host configurations of the last to be done before the service stops and the one next scheduled.
The installation is on a vmware virtual server and when restored to earlier snapshots nagios runs fine for a couple of weeks and the problem happens again. Hardware resources to seem taxed at all and there is plenty of space on the file systems.
I have found similar descriptions of this problem where NDO was the culprit, but we are using pnp4nagios/rrdtool so that does not apply. I upgraded to Nagios 3.4.4 and pnp4nagios 0.6.19 and the service fails at exactly the same point.
Any help would be greatly appreciated - especially assistence with deeper/more comprehensive troubleshooting tips. This is my first post so please gently point me in the right direction if this is the wrong forum.
Thanks!!!
The installation is on a vmware virtual server and when restored to earlier snapshots nagios runs fine for a couple of weeks and the problem happens again. Hardware resources to seem taxed at all and there is plenty of space on the file systems.
I have found similar descriptions of this problem where NDO was the culprit, but we are using pnp4nagios/rrdtool so that does not apply. I upgraded to Nagios 3.4.4 and pnp4nagios 0.6.19 and the service fails at exactly the same point.
Any help would be greatly appreciated - especially assistence with deeper/more comprehensive troubleshooting tips. This is my first post so please gently point me in the right direction if this is the wrong forum.
Thanks!!!