Page 1 of 2
nagios dead but subsys locked
Posted: Mon Dec 30, 2013 1:25 pm
by vinothsethuram
Hi,
Unexpectedly Nagios went down and failed to monitor my host and services. Log shows as follows
Caught SIGSEGV, shutting down...
When I try to restart the nagios, I got the following info
nagios dead but subsys locked
Could you please let me know reason for above issue and how can we avoid this is in future?
Re: nagios dead but subsys locked
Posted: Mon Dec 30, 2013 1:35 pm
by abrist
Unexpected shutdowns may leave lock files. Remove:
Code: Select all
rm /usr/local/nagios/var/nagios.lock
And then restart nagios:
Re: nagios dead but subsys locked
Posted: Mon Dec 30, 2013 1:38 pm
by lmiltchev
Are you using mk-livestatus? What's the output of the following command?
Code: Select all
tail -30 /usr/local/nagios/var/nagios.log
Re: nagios dead but subsys locked
Posted: Mon Dec 30, 2013 1:43 pm
by vinothsethuram
abrist wrote:Unexpected shutdowns may leave lock files. Remove:
Code: Select all
rm /usr/local/nagios/var/nagios.lock
And then restart nagios:
I executed above commands. Thank you.
Re: nagios dead but subsys locked
Posted: Mon Dec 30, 2013 1:44 pm
by vinothsethuram
lmiltchev wrote:Are you using mk-livestatus? What's the output of the following command?
Code: Select all
tail -30 /usr/local/nagios/var/nagios.log
I executed above commands by changing 30 to 200, but no info about nagios lock or sigserv error. Please help to understand the cause for this downtime.
Re: nagios dead but subsys locked
Posted: Mon Dec 30, 2013 1:47 pm
by slansing
You need to share the output as requested:
Code: Select all
tail -30 /usr/local/nagios/var/nagios.log
As well as:
Though it is possible whatever caused this is no longer present in the last 30 lines of the logs, if it was ever. You would most likely want to open your current /var/log/messages and hunt around at the point in time that this occurred.
Re: nagios dead but subsys locked
Posted: Mon Dec 30, 2013 1:56 pm
by vinothsethuram
Code: Select all
Dec 29 23:24:57 nagios nagios: wproc: 'Core Worker 5129' seems to be choked. ret = 528000; bufsize = 826406: errno = 11 (Resource temporarily unavailable)
Dec 29 23:24:57 nagios nagios: wproc: iocache_read() from Core Worker 5129 returned -1: Connection reset by peer
Dec 29 23:24:57 nagios nagios: wproc: Socket to worker Core Worker 5129 broken, removing
Dec 29 23:24:57 nagios nagios: Caught SIGSEGV, shutting down...
Will it help you to analyse the cause?
Re: nagios dead but subsys locked
Posted: Mon Dec 30, 2013 1:59 pm
by slansing
If you can reply with the information we have been requesting we can.
Re: nagios dead but subsys locked
Posted: Mon Dec 30, 2013 2:03 pm
by vinothsethuram
slansing wrote:If you can reply with the information we have been requesting we can.
you asked me to run the following command.
And I got the following message as output which matches the downtime and downtime info.
Code: Select all
Dec 29 23:24:57 nagios nagios: wproc: 'Core Worker 5129' seems to be choked. ret = 528000; bufsize = 826406: errno = 11 (Resource temporarily unavailable)
Dec 29 23:24:57 nagios nagios: wproc: iocache_read() from Core Worker 5129 returned -1: Connection reset by peer
Dec 29 23:24:57 nagios nagios: wproc: Socket to worker Core Worker 5129 broken, removing
Dec 29 23:24:57 nagios nagios: Caught SIGSEGV, shutting down...
Re: nagios dead but subsys locked
Posted: Mon Dec 30, 2013 2:46 pm
by slansing
You only have 4 lines in your /var/log/messages log? Are you absolutely sure? We asked for the last 30 lines, not just 4 lines that specifically have to deal with the event..