lock file looks like its already held by another instance

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
kendallchenoweth
Posts: 195
Joined: Fri Sep 13, 2013 10:43 am

lock file looks like its already held by another instance

Post by kendallchenoweth »

The monitoring engine is stopped. when I try to start it, I get the following error in nagios.log.

Code: Select all

[1418914938] ndomod registered for contact data'
[1418914938] ndomod registered for contact notification data'
[1418914938] ndomod registered for acknowledgement data'
[1418914938] ndomod registered for state change data'
[1418914938] ndomod registered for contact status data'
[1418914938] ndomod registered for adaptive contact data'
[1418914938] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1418914938] Successfully launched command file worker with pid 1258
[1418914954] ndomod: Successfully connected to data sink.  1970 queued items to flush.
[1418914954] ndomod: Successfully flushed 1970 queued items to data sink.
[1418915024] Lockfile '/usr/local/nagios/var/nagios.lock' looks like its already held by another instance of Nagios (PID 1211).  Bailing out...
[1418915028] Lockfile '/usr/local/nagios/var/nagios.lock' looks like its already held by another instance of Nagios (PID 1211).  Bailing out...
I tried stopping and starting nagios/ndo2db and checking for stale nagios processes. I also tried stopping and starting nagios, nagiosxi, ndo2db, mysql and postresql and restarting all. I also tried rebooting. None of these "solutions" worked.

Why did this occur and how do I fix it?

Thanks!
kendallchenoweth
Posts: 195
Joined: Fri Sep 13, 2013 10:43 am

Re: lock file looks like its already held by another instanc

Post by kendallchenoweth »

After rebooting, I found another nagios -d process that I killed. (I don't know why that was started.) With only one nagios -d process, I stopped nagios and ndo2db and restarted and everything is working.

Do you have an idea of what could have caused this problem?
kendallchenoweth
Posts: 195
Joined: Fri Sep 13, 2013 10:43 am

Re: lock file looks like its already held by another instanc

Post by kendallchenoweth »

I spoke too soon, just after the problem appeared to right itself, it reverted to a stopped state again. I still need your help.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: lock file looks like its already held by another instanc

Post by lmiltchev »

Are you using mklivestatus? What is the output of the following command?

Code: Select all

grep live /var/log/messages
Is opening an email support ticket an option for you? If it is, please send us the "profile.zip" file at "[email protected]".
Be sure to check out our Knowledgebase for helpful articles and solutions!
kendallchenoweth
Posts: 195
Joined: Fri Sep 13, 2013 10:43 am

Re: lock file looks like its already held by another instanc

Post by kendallchenoweth »

[root@ip-10-154-25-117 ~]# grep live /var/log/messages
Dec 14 03:55:30 ip-10-154-25-117 nagios: SERVICE ALERT: mls-rabbit-usw1;RABBIT aliveness;CRITICAL;SOFT;1;MW_CHECK_RABBITMQ_ALIVENESS CRITICAL - Received 500 read timeout for vhost: mls
Dec 14 03:56:14 ip-10-154-25-117 nagios: SERVICE ALERT: mls-rabbit-usw1;RABBIT aliveness;OK;SOFT;2;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 14 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 14 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit-usw1;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 15 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 15 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit-usw1;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 16 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 16 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit-usw1;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 17 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
Dec 17 23:59:59 ip-10-154-25-117 nagios: CURRENT SERVICE STATE: mls-rabbit-usw1;RABBIT aliveness;OK;HARD;1;MW_CHECK_RABBITMQ_ALIVENESS OK - vhost: mls
[root@ip-10-154-25-117 ~]#


Now it's working again. I disabled a check that had a remediation check that restarted nagios. I'm thinking that could have been the problem. I'll leave that disabled overnight and see if the problem returns. If not, I'll assume it's my fault. :)

Thanks for your help.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: lock file looks like its already held by another instanc

Post by tgriep »

Thank you for the update.

Can you post the check you were running that was causing this?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked