Page 1 of 1

lock file looks like its already held by another instance

Posted: Thu Dec 18, 2014 10:08 am
by kendallchenoweth
The monitoring engine is stopped. when I try to start it, I get the following error in nagios.log.

Code: Select all

[1418914938] ndomod registered for contact data'
[1418914938] ndomod registered for contact notification data'
[1418914938] ndomod registered for acknowledgement data'
[1418914938] ndomod registered for state change data'
[1418914938] ndomod registered for contact status data'
[1418914938] ndomod registered for adaptive contact data'
[1418914938] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1418914938] Successfully launched command file worker with pid 1258
[1418914954] ndomod: Successfully connected to data sink.  1970 queued items to flush.
[1418914954] ndomod: Successfully flushed 1970 queued items to data sink.
[1418915024] Lockfile '/usr/local/nagios/var/nagios.lock' looks like its already held by another instance of Nagios (PID 1211).  Bailing out...
[1418915028] Lockfile '/usr/local/nagios/var/nagios.lock' looks like its already held by another instance of Nagios (PID 1211).  Bailing out...
I tried stopping and starting nagios/ndo2db and checking for stale nagios processes. I also tried stopping and starting nagios, nagiosxi, ndo2db, mysql and postresql and restarting all. I also tried rebooting. None of these "solutions" worked.

Why did this occur and how do I fix it?

Thanks!

Re: lock file looks like its already held by another instanc

Posted: Thu Dec 18, 2014 10:12 am
by kendallchenoweth
After rebooting, I found another nagios -d process that I killed. (I don't know why that was started.) With only one nagios -d process, I stopped nagios and ndo2db and restarted and everything is working.

Do you have an idea of what could have caused this problem?

Re: lock file looks like its already held by another instanc

Posted: Thu Dec 18, 2014 10:19 am
by kendallchenoweth
I spoke too soon, just after the problem appeared to right itself, it reverted to a stopped state again. I still need your help.

Re: lock file looks like its already held by another instanc

Posted: Thu Dec 18, 2014 12:09 pm
by lmiltchev
Are you using mklivestatus? What is the output of the following command?

Code: Select all

grep live /var/log/messages
Is opening an email support ticket an option for you? If it is, please send us the "profile.zip" file at "[email protected]".

Re: lock file looks like its already held by another instanc

Posted: Thu Dec 18, 2014 1:40 pm
by kendallchenoweth
[root@ip-10-154-25-117 ~]# grep live /var/log/messages
Dec 14 03:55:30 ip-10-154-25-117 nagios: SERVICE ALERT: mls-rabbit-usw1;RABBIT aliveness;CRITICAL;SOFT;1;MW_CHECK_RABBITMQ_ALIVENESS CRITICAL - Received 500 read timeout for vhost: mls
Dec 14 03:56:14 ip-10-154-25-117 nagios: SERVICE ALERT: mls-rabbit-usw1;RABBIT aliveness;OK;SOFT;2;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 14 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 14 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit-usw1;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 15 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 15 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit-usw1;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 16 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 16 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit-usw1;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 17 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 17 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit-usw1;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
[root@ip-10-154-25-117 ~]#


Now it's working again. I disabled a check that had a remediation check that restarted nagios. I'm thinking that could have been the problem. I'll leave that disabled overnight and see if the problem returns. If not, I'll assume it's my fault. :)

Thanks for your help.

Re: lock file looks like its already held by another instanc

Posted: Thu Dec 18, 2014 1:44 pm
by tgriep
Thank you for the update.

Can you post the check you were running that was causing this?