Page 1 of 2

nagios dead but subsys locked

Posted: Mon Dec 30, 2013 1:25 pm
by vinothsethuram
Hi,

Unexpectedly Nagios went down and failed to monitor my host and services. Log shows as follows

Caught SIGSEGV, shutting down...

When I try to restart the nagios, I got the following info

nagios dead but subsys locked

Could you please let me know reason for above issue and how can we avoid this is in future?

Re: nagios dead but subsys locked

Posted: Mon Dec 30, 2013 1:35 pm
by abrist
Unexpected shutdowns may leave lock files. Remove:

Code: Select all

rm /usr/local/nagios/var/nagios.lock
And then restart nagios:

Code: Select all

service nagios restart

Re: nagios dead but subsys locked

Posted: Mon Dec 30, 2013 1:38 pm
by lmiltchev
Are you using mk-livestatus? What's the output of the following command?

Code: Select all

tail -30 /usr/local/nagios/var/nagios.log

Re: nagios dead but subsys locked

Posted: Mon Dec 30, 2013 1:43 pm
by vinothsethuram
abrist wrote:Unexpected shutdowns may leave lock files. Remove:

Code: Select all

rm /usr/local/nagios/var/nagios.lock
And then restart nagios:

Code: Select all

service nagios restart

I executed above commands. Thank you.

Re: nagios dead but subsys locked

Posted: Mon Dec 30, 2013 1:44 pm
by vinothsethuram
lmiltchev wrote:Are you using mk-livestatus? What's the output of the following command?

Code: Select all

tail -30 /usr/local/nagios/var/nagios.log

I executed above commands by changing 30 to 200, but no info about nagios lock or sigserv error. Please help to understand the cause for this downtime.

Re: nagios dead but subsys locked

Posted: Mon Dec 30, 2013 1:47 pm
by slansing
You need to share the output as requested:

Code: Select all

tail -30 /usr/local/nagios/var/nagios.log
As well as:

Code: Select all

tail -30 /var/log/messages
Though it is possible whatever caused this is no longer present in the last 30 lines of the logs, if it was ever. You would most likely want to open your current /var/log/messages and hunt around at the point in time that this occurred.

Re: nagios dead but subsys locked

Posted: Mon Dec 30, 2013 1:56 pm
by vinothsethuram

Code: Select all

Dec 29 23:24:57 nagios nagios: wproc: 'Core Worker 5129' seems to be choked. ret = 528000; bufsize = 826406: errno = 11 (Resource temporarily unavailable)
Dec 29 23:24:57 nagios nagios: wproc: iocache_read() from Core Worker 5129 returned -1: Connection reset by peer
Dec 29 23:24:57 nagios nagios: wproc: Socket to worker Core Worker 5129 broken, removing
Dec 29 23:24:57 nagios nagios: Caught SIGSEGV, shutting down...

Will it help you to analyse the cause?

Re: nagios dead but subsys locked

Posted: Mon Dec 30, 2013 1:59 pm
by slansing
If you can reply with the information we have been requesting we can.

Re: nagios dead but subsys locked

Posted: Mon Dec 30, 2013 2:03 pm
by vinothsethuram
slansing wrote:If you can reply with the information we have been requesting we can.
you asked me to run the following command.

Code: Select all

tail -30 /var/log/messages
And I got the following message as output which matches the downtime and downtime info.

Code: Select all

Dec 29 23:24:57 nagios nagios: wproc: 'Core Worker 5129' seems to be choked. ret = 528000; bufsize = 826406: errno = 11 (Resource temporarily unavailable)
Dec 29 23:24:57 nagios nagios: wproc: iocache_read() from Core Worker 5129 returned -1: Connection reset by peer
Dec 29 23:24:57 nagios nagios: wproc: Socket to worker Core Worker 5129 broken, removing
Dec 29 23:24:57 nagios nagios: Caught SIGSEGV, shutting down...

Re: nagios dead but subsys locked

Posted: Mon Dec 30, 2013 2:46 pm
by slansing
You only have 4 lines in your /var/log/messages log? Are you absolutely sure? We asked for the last 30 lines, not just 4 lines that specifically have to deal with the event..