Page 7 of 7

Re: Nagios XI host check orphaned and duplicate nagios proce

Posted: Fri Jul 07, 2017 4:31 pm
by scottwilkerson
If it happens again, would it be possible to get a System Profile before fixing the problem, or at the very least an output of the following

Code: Select all

ps -ef|grep bin/nagios

Re: Nagios XI host check orphaned and duplicate nagios proce

Posted: Fri Jul 07, 2017 4:32 pm
by emartine
scottwilkerson wrote:
emartine wrote:Audit log shows

2017-07-07 05:18:31 95296 Nagios XI INFO localhost User submitted a command to the subsystem (ID=1119)
2017-07-07 05:01:02 95295 Nagios XI INFO localhost User submitted a command to the subsystem (ID=1117)

At 5:01 nagios does an SSH backup. I am not sure what the process at 5:18 is.
ID=1119 is COMMAND_DELETE_SYSTEM_BACKUP, so it deleted a previous backup.

This shouldn't at all cause your setup to fail.. Before you kill off the processes do you note if there are multiple nagios processes?

I have seen this happen on a Nagios restart if it has to wait too long for mod_gearman workers to return their results

There were multiple nagios processes. No one had made any change to cause Nagios to restart at ~5:25AM. No one logged into the server until about 5:40 and commands were not submitted.

Re: Nagios XI host check orphaned and duplicate nagios proce

Posted: Fri Jul 07, 2017 5:03 pm
by tgriep
One thing to try is to implement the changes in from this KB article.
https://support.nagios.com/kb/article/n ... anner.html
It should help out in the duplicate nagios processes.

Re: Nagios XI host check orphaned and duplicate nagios proce

Posted: Thu Jul 13, 2017 8:57 am
by emartine
Ah! That's right. I forgot to re-implement this! I believe this will help resolve the issue. I am assuming that if I edit this file and a host or service is added/changed that it should take effect next time someone hits apply?

Re: Nagios XI host check orphaned and duplicate nagios proce

Posted: Thu Jul 13, 2017 11:40 am
by tgriep
Yes, the next time the Nagios Process is restarted (Apply Config) it will wait longer to restart the process and hopefully the issue will not happen for you again.