I have a nagios installation where the status file shows incorrect(old) information. If a state changes to ok, the notifications are sent out correctly, the log files are written correctly. The state file changes to the new state, then changes back. It bounces back and forth for hours before settling on the correct state. Most of the time it shows the old down/warning state. I'm not sure if the same is happening on the ok to down/warning transition. I though I might have a buggy version, so I upgraded to the latest; no change. This is a nagios server that I inherited. Now I'm wondering if there is something weird in the config files. I can't seem to find it. Anyone have any ideas on how to troubleshoot this further?
Thanks in advance for any help.
Status file incorrect
-
wormfishin
- Posts: 31
- Joined: Tue Apr 10, 2012 8:11 am
Re: Status file incorrect
Is it possible that your seeing soft failures change the state, but the alerts aren't going out because it recovers before they are triggered?
-
wleister-fs
- Posts: 2
- Joined: Tue Apr 10, 2012 10:18 am
Re: Status file incorrect
I don't think so. The last check time reverts back to an older check time before it was ok. When the status file is correct the correct last check is shown.
Re: Status file incorrect
It's possible you've spawned multiple instances of Nagios, in which case two processes would be writing info to the status file, and one of them would be stale. Stop the nagios process, then run:
Then start the nagios process again.
Code: Select all
killalll -9 nagios