Page 2 of 3

Re: 00Caught SIGSEGV, shutting down...

Posted: Wed Dec 18, 2013 1:41 pm
by Mitchell
okay. I will need to schedule the upgrades. I have livestatus 1.2.2p2 on all servers with XI 2.5 so little skeptical.
broker_module=/usr/lib/check_mk/livestatus.o /usr/local/nagios/var/rw/live idle_timeout=60000 num_client_threads=40

Re: 00Caught SIGSEGV, shutting down...

Posted: Wed Dec 18, 2013 4:59 pm
by abrist
Yeah, the upgrades would definitely be suggested. What was the reason for the optional broker arguments?
Mitchell wrote:broker_module=/usr/lib/check_mk/livestatus.o /usr/local/nagios/var/rw/live idle_timeout=60000 num_client_threads=40

Re: 00Caught SIGSEGV, shutting down...

Posted: Thu Dec 19, 2013 8:49 am
by mrochelle
We have been experiencing the same problem after updating to the Nagios XI 2012R2.5 release. It happens every 3 or 4 days right at midnight. We do not have anything scheduled for that time. Following is the errors in the messages log.
Dec 19 00:00:00 nagprod01 nagios: Caught SIGSEGV, shutting down...
Dec 19 00:00:00 nagprod01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Dec 19 00:00:00 nagprod01 ndo2db: mysql_error: 'MySQL server has gone away'
Dec 19 00:00:00 nagprod01 ndo2db: Error: Connection to MySQL database has been lost!


Of course we have to restart Nagios to recover.
I'm going to update today to Nagios XI 2012R2.7 and proabably update the livestatus broker to 1.2.2p3 from 1.2.2p2 also.

My nagios.cfg entry:
broker_module=/usr/local/lib/mk-livestatus/livestatus.o /usr/local/nagios/var/rw/live

Re: 00Caught SIGSEGV, shutting down...

Posted: Thu Dec 19, 2013 10:29 am
by abrist
mrochelle wrote:We have been experiencing the same problem after updating to the Nagios XI 2012R2.5 release. It happens every 3 or 4 days right at midnight. We do not have anything scheduled for that time. Following is the errors in the messages log.
Dec 19 00:00:00 nagprod01 nagios: Caught SIGSEGV, shutting down...
Dec 19 00:00:00 nagprod01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Dec 19 00:00:00 nagprod01 ndo2db: mysql_error: 'MySQL server has gone away'
Dec 19 00:00:00 nagprod01 ndo2db: Error: Connection to MySQL database has been lost!
These errors do not look like livestatus, but with your mysql server experiencing issues. Let us know how the upgrade goes and the state of this aberrant behavior afterwards.

Re: 00Caught SIGSEGV, shutting down...

Posted: Thu Dec 19, 2013 1:35 pm
by Mitchell
abrist wrote:Yeah, the upgrades would definitely be suggested. What was the reason for the optional broker arguments?
Mitchell wrote:broker_module=/usr/lib/check_mk/livestatus.o /usr/local/nagios/var/rw/live idle_timeout=60000 num_client_threads=40
We have multiple sites and use check_mk multisite (multi location with GTM). The optional arguments help with a good balance for enough threads with not very long idle timeout.

Re: 00Caught SIGSEGV, shutting down...

Posted: Thu Dec 19, 2013 5:55 pm
by slansing
Did you get the update done successfully?

Re: 00Caught SIGSEGV, shutting down...

Posted: Thu Dec 19, 2013 10:00 pm
by mrochelle
Yes, the update to Nagios XI 2012R2.7 was installed this afternoon along with mk-livestatus-1.2.2p2-3.x86_64.rpm package. If we go a week or two without the problem, I would consider it a fix. :geek:

Re: 00Caught SIGSEGV, shutting down...

Posted: Fri Dec 20, 2013 9:47 am
by tmcdonald
Alright, we'll leave this open for a bit longer until we hear from Mitchell.

Re: 00Caught SIGSEGV, shutting down...

Posted: Sun Dec 22, 2013 8:41 am
by mrochelle
While we are waiting for Mitchell, after updating to the latest version of NagiosXI2012R2.7, and livestatus 1.2.2p2-3. The Caught SIGSEGV, shutting down occurred this morning at midnight. Following are the event and message logs around that time.
Event Log
[2013-12-22 02:13:28] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[2013-12-22 02:13:28] ndomod: Successfully connected to data sink. 0 queued items to flush.
[2013-12-22 02:13:28] ndomod: NDOMOD 1.5.2 (06-08-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[2013-12-22 02:13:28] Event broker module '/usr/lib64/mod_gearman/mod_gearman.o' initialized successfully.
[2013-12-22 02:13:28] mod_gearman: initialized version 1.3.8 (libgearman 0.25)
[2013-12-22 02:13:28] Event broker module '/usr/local/lib/mk-livestatus/livestatus.o' initialized successfully.
[2013-12-22 02:13:28] livestatus: Finished initialization. Further log messages go to /usr/local/nagios/var/livestatus.log
[2013-12-22 02:13:28] livestatus: archive path /usr/local/nagios/var/archives
[2013-12-22 02:13:28] livestatus: Removed old left over socket file /usr/local/nagios/var/rw/live
[2013-12-22 02:13:28] livestatus: Please visit OMD at http://omdistro.org
[2013-12-22 02:13:28] livestatus: Hint: please try out OMD - the Open Monitoring Distribution
[2013-12-22 02:13:28] livestatus: Please visit us at http://mathias-kettner.de/
[2013-12-22 02:13:28] livestatus: Livestatus 1.2.2p2 by Mathias Kettner. Socket: '/usr/local/nagios/var/rw/live'
[2013-12-22 02:13:28] LOG VERSION: 2.0
[2013-12-22 02:13:28] Local time is Sun Dec 22 02:13:28 CST 2013
[2013-12-22 02:13:28] Nagios 3.5.0 starting... (PID=20731)
[2013-12-22 02:08:57] Auto-save of retention data completed successfully.
[2013-12-22 02:08:17] SERVICE ALERT: Nagios_Master_148.80.140.92;Gearman Queue Status;CRITICAL;HARD;3;CHECK_GEARMAN CRITICAL - less than 1 workers were found having function 'worker_nagprod02.cellnet.com' registered.
[2013-12-22 02:01:00] TIMEPERIOD TRANSITION: Daily 0001;1;0
[2013-12-22 02:00:00] TIMEPERIOD TRANSITION: Daily 0200;0;1
[2013-12-22 02:00:00] TIMEPERIOD TRANSITION: 0200-2200;0;1


--------------------------------------------------------------------------------
December 22, 2013 01:00
--------------------------------------------------------------------------------


[2013-12-22 01:45:00] TIMEPERIOD TRANSITION: 0145-0045;0;1
[2013-12-22 01:08:57] Auto-save of retention data completed successfully.


--------------------------------------------------------------------------------
December 22, 2013 00:00
--------------------------------------------------------------------------------


[2013-12-22 00:46:00] TIMEPERIOD TRANSITION: 0145-0045;1;0
[2013-12-22 00:45:00] TIMEPERIOD TRANSITION: 0045-2345;0;1
[2013-12-22 00:30:00] TIMEPERIOD TRANSITION: Daily 0030;0;1
[2013-12-22 00:15:00] TIMEPERIOD TRANSITION: Daily 0015;0;1
[2013-12-22 00:08:57] Auto-save of retention data completed successfully.
[2013-12-22 00:01:00] TIMEPERIOD TRANSITION: Daily 0001;0;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: notification_times;1;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: xi_timeperiod_none;0;0
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: xi_timeperiod_24x7;1;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: workhours;0;0


ALERT LOG
[2013-12-22 02:13:28] Nagios 3.5.0 starting... (PID=20731)
[2013-12-22 02:08:17] SERVICE ALERT: Nagios_Master_148.80.140.92;Gearman Queue Status;CRITICAL;HARD;3;CHECK_GEARMAN CRITICAL - less than 1 workers were found having function 'worker_nagprod02.cellnet.com' registered.

--------------------------------------------------------------------------------
December 22, 2013 00:00
--------------------------------------------------------------------------------

[2013-12-22 00:00:00] Caught SIGSEGV, shutting down...

After the Caught SIGSEGV at 00:00:00, Nagios fails to start until action is taken to restart it.

Re: 00Caught SIGSEGV, shutting down...

Posted: Mon Dec 23, 2013 9:06 am
by scottwilkerson
This is a well know issue with livestatus, and appear to happen when they upgrade to livestatus 1.2.2p2+
https://www.google.com/search?q=livestatus+SIGSEGV

see also
https://www.mail-archive.com/checkmk-en ... 09336.html