broker_module=/usr/lib/check_mk/livestatus.o /usr/local/nagios/var/rw/live idle_timeout=60000 num_client_threads=40
00Caught SIGSEGV, shutting down...
Re: 00Caught SIGSEGV, shutting down...
okay. I will need to schedule the upgrades. I have livestatus 1.2.2p2 on all servers with XI 2.5 so little skeptical.
Re: 00Caught SIGSEGV, shutting down...
Yeah, the upgrades would definitely be suggested. What was the reason for the optional broker arguments?
Mitchell wrote:broker_module=/usr/lib/check_mk/livestatus.o /usr/local/nagios/var/rw/live idle_timeout=60000 num_client_threads=40
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: 00Caught SIGSEGV, shutting down...
We have been experiencing the same problem after updating to the Nagios XI 2012R2.5 release. It happens every 3 or 4 days right at midnight. We do not have anything scheduled for that time. Following is the errors in the messages log.
Dec 19 00:00:00 nagprod01 nagios: Caught SIGSEGV, shutting down...
Dec 19 00:00:00 nagprod01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Dec 19 00:00:00 nagprod01 ndo2db: mysql_error: 'MySQL server has gone away'
Dec 19 00:00:00 nagprod01 ndo2db: Error: Connection to MySQL database has been lost!
Of course we have to restart Nagios to recover.
I'm going to update today to Nagios XI 2012R2.7 and proabably update the livestatus broker to 1.2.2p3 from 1.2.2p2 also.
My nagios.cfg entry:
broker_module=/usr/local/lib/mk-livestatus/livestatus.o /usr/local/nagios/var/rw/live
Dec 19 00:00:00 nagprod01 nagios: Caught SIGSEGV, shutting down...
Dec 19 00:00:00 nagprod01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Dec 19 00:00:00 nagprod01 ndo2db: mysql_error: 'MySQL server has gone away'
Dec 19 00:00:00 nagprod01 ndo2db: Error: Connection to MySQL database has been lost!
Of course we have to restart Nagios to recover.
I'm going to update today to Nagios XI 2012R2.7 and proabably update the livestatus broker to 1.2.2p3 from 1.2.2p2 also.
My nagios.cfg entry:
broker_module=/usr/local/lib/mk-livestatus/livestatus.o /usr/local/nagios/var/rw/live
Re: 00Caught SIGSEGV, shutting down...
These errors do not look like livestatus, but with your mysql server experiencing issues. Let us know how the upgrade goes and the state of this aberrant behavior afterwards.mrochelle wrote:We have been experiencing the same problem after updating to the Nagios XI 2012R2.5 release. It happens every 3 or 4 days right at midnight. We do not have anything scheduled for that time. Following is the errors in the messages log.
Dec 19 00:00:00 nagprod01 nagios: Caught SIGSEGV, shutting down...
Dec 19 00:00:00 nagprod01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Dec 19 00:00:00 nagprod01 ndo2db: mysql_error: 'MySQL server has gone away'
Dec 19 00:00:00 nagprod01 ndo2db: Error: Connection to MySQL database has been lost!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: 00Caught SIGSEGV, shutting down...
We have multiple sites and use check_mk multisite (multi location with GTM). The optional arguments help with a good balance for enough threads with not very long idle timeout.abrist wrote:Yeah, the upgrades would definitely be suggested. What was the reason for the optional broker arguments?Mitchell wrote:broker_module=/usr/lib/check_mk/livestatus.o /usr/local/nagios/var/rw/live idle_timeout=60000 num_client_threads=40
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: 00Caught SIGSEGV, shutting down...
Did you get the update done successfully?
Re: 00Caught SIGSEGV, shutting down...
Yes, the update to Nagios XI 2012R2.7 was installed this afternoon along with mk-livestatus-1.2.2p2-3.x86_64.rpm package. If we go a week or two without the problem, I would consider it a fix. 
Re: 00Caught SIGSEGV, shutting down...
Alright, we'll leave this open for a bit longer until we hear from Mitchell.
Former Nagios employee
Re: 00Caught SIGSEGV, shutting down...
While we are waiting for Mitchell, after updating to the latest version of NagiosXI2012R2.7, and livestatus 1.2.2p2-3. The Caught SIGSEGV, shutting down occurred this morning at midnight. Following are the event and message logs around that time.
Event Log
[2013-12-22 02:13:28] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[2013-12-22 02:13:28] ndomod: Successfully connected to data sink. 0 queued items to flush.
[2013-12-22 02:13:28] ndomod: NDOMOD 1.5.2 (06-08-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[2013-12-22 02:13:28] Event broker module '/usr/lib64/mod_gearman/mod_gearman.o' initialized successfully.
[2013-12-22 02:13:28] mod_gearman: initialized version 1.3.8 (libgearman 0.25)
[2013-12-22 02:13:28] Event broker module '/usr/local/lib/mk-livestatus/livestatus.o' initialized successfully.
[2013-12-22 02:13:28] livestatus: Finished initialization. Further log messages go to /usr/local/nagios/var/livestatus.log
[2013-12-22 02:13:28] livestatus: archive path /usr/local/nagios/var/archives
[2013-12-22 02:13:28] livestatus: Removed old left over socket file /usr/local/nagios/var/rw/live
[2013-12-22 02:13:28] livestatus: Please visit OMD at http://omdistro.org
[2013-12-22 02:13:28] livestatus: Hint: please try out OMD - the Open Monitoring Distribution
[2013-12-22 02:13:28] livestatus: Please visit us at http://mathias-kettner.de/
[2013-12-22 02:13:28] livestatus: Livestatus 1.2.2p2 by Mathias Kettner. Socket: '/usr/local/nagios/var/rw/live'
[2013-12-22 02:13:28] LOG VERSION: 2.0
[2013-12-22 02:13:28] Local time is Sun Dec 22 02:13:28 CST 2013
[2013-12-22 02:13:28] Nagios 3.5.0 starting... (PID=20731)
[2013-12-22 02:08:57] Auto-save of retention data completed successfully.
[2013-12-22 02:08:17] SERVICE ALERT: Nagios_Master_148.80.140.92;Gearman Queue Status;CRITICAL;HARD;3;CHECK_GEARMAN CRITICAL - less than 1 workers were found having function 'worker_nagprod02.cellnet.com' registered.
[2013-12-22 02:01:00] TIMEPERIOD TRANSITION: Daily 0001;1;0
[2013-12-22 02:00:00] TIMEPERIOD TRANSITION: Daily 0200;0;1
[2013-12-22 02:00:00] TIMEPERIOD TRANSITION: 0200-2200;0;1
--------------------------------------------------------------------------------
December 22, 2013 01:00
--------------------------------------------------------------------------------
[2013-12-22 01:45:00] TIMEPERIOD TRANSITION: 0145-0045;0;1
[2013-12-22 01:08:57] Auto-save of retention data completed successfully.
--------------------------------------------------------------------------------
December 22, 2013 00:00
--------------------------------------------------------------------------------
[2013-12-22 00:46:00] TIMEPERIOD TRANSITION: 0145-0045;1;0
[2013-12-22 00:45:00] TIMEPERIOD TRANSITION: 0045-2345;0;1
[2013-12-22 00:30:00] TIMEPERIOD TRANSITION: Daily 0030;0;1
[2013-12-22 00:15:00] TIMEPERIOD TRANSITION: Daily 0015;0;1
[2013-12-22 00:08:57] Auto-save of retention data completed successfully.
[2013-12-22 00:01:00] TIMEPERIOD TRANSITION: Daily 0001;0;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: notification_times;1;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: xi_timeperiod_none;0;0
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: xi_timeperiod_24x7;1;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: workhours;0;0
ALERT LOG
[2013-12-22 02:13:28] Nagios 3.5.0 starting... (PID=20731)
[2013-12-22 02:08:17] SERVICE ALERT: Nagios_Master_148.80.140.92;Gearman Queue Status;CRITICAL;HARD;3;CHECK_GEARMAN CRITICAL - less than 1 workers were found having function 'worker_nagprod02.cellnet.com' registered.
--------------------------------------------------------------------------------
December 22, 2013 00:00
--------------------------------------------------------------------------------
[2013-12-22 00:00:00] Caught SIGSEGV, shutting down...
After the Caught SIGSEGV at 00:00:00, Nagios fails to start until action is taken to restart it.
Event Log
[2013-12-22 02:13:28] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[2013-12-22 02:13:28] ndomod: Successfully connected to data sink. 0 queued items to flush.
[2013-12-22 02:13:28] ndomod: NDOMOD 1.5.2 (06-08-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[2013-12-22 02:13:28] Event broker module '/usr/lib64/mod_gearman/mod_gearman.o' initialized successfully.
[2013-12-22 02:13:28] mod_gearman: initialized version 1.3.8 (libgearman 0.25)
[2013-12-22 02:13:28] Event broker module '/usr/local/lib/mk-livestatus/livestatus.o' initialized successfully.
[2013-12-22 02:13:28] livestatus: Finished initialization. Further log messages go to /usr/local/nagios/var/livestatus.log
[2013-12-22 02:13:28] livestatus: archive path /usr/local/nagios/var/archives
[2013-12-22 02:13:28] livestatus: Removed old left over socket file /usr/local/nagios/var/rw/live
[2013-12-22 02:13:28] livestatus: Please visit OMD at http://omdistro.org
[2013-12-22 02:13:28] livestatus: Hint: please try out OMD - the Open Monitoring Distribution
[2013-12-22 02:13:28] livestatus: Please visit us at http://mathias-kettner.de/
[2013-12-22 02:13:28] livestatus: Livestatus 1.2.2p2 by Mathias Kettner. Socket: '/usr/local/nagios/var/rw/live'
[2013-12-22 02:13:28] LOG VERSION: 2.0
[2013-12-22 02:13:28] Local time is Sun Dec 22 02:13:28 CST 2013
[2013-12-22 02:13:28] Nagios 3.5.0 starting... (PID=20731)
[2013-12-22 02:08:57] Auto-save of retention data completed successfully.
[2013-12-22 02:08:17] SERVICE ALERT: Nagios_Master_148.80.140.92;Gearman Queue Status;CRITICAL;HARD;3;CHECK_GEARMAN CRITICAL - less than 1 workers were found having function 'worker_nagprod02.cellnet.com' registered.
[2013-12-22 02:01:00] TIMEPERIOD TRANSITION: Daily 0001;1;0
[2013-12-22 02:00:00] TIMEPERIOD TRANSITION: Daily 0200;0;1
[2013-12-22 02:00:00] TIMEPERIOD TRANSITION: 0200-2200;0;1
--------------------------------------------------------------------------------
December 22, 2013 01:00
--------------------------------------------------------------------------------
[2013-12-22 01:45:00] TIMEPERIOD TRANSITION: 0145-0045;0;1
[2013-12-22 01:08:57] Auto-save of retention data completed successfully.
--------------------------------------------------------------------------------
December 22, 2013 00:00
--------------------------------------------------------------------------------
[2013-12-22 00:46:00] TIMEPERIOD TRANSITION: 0145-0045;1;0
[2013-12-22 00:45:00] TIMEPERIOD TRANSITION: 0045-2345;0;1
[2013-12-22 00:30:00] TIMEPERIOD TRANSITION: Daily 0030;0;1
[2013-12-22 00:15:00] TIMEPERIOD TRANSITION: Daily 0015;0;1
[2013-12-22 00:08:57] Auto-save of retention data completed successfully.
[2013-12-22 00:01:00] TIMEPERIOD TRANSITION: Daily 0001;0;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: notification_times;1;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: xi_timeperiod_none;0;0
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: xi_timeperiod_24x7;1;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: workhours;0;0
ALERT LOG
[2013-12-22 02:13:28] Nagios 3.5.0 starting... (PID=20731)
[2013-12-22 02:08:17] SERVICE ALERT: Nagios_Master_148.80.140.92;Gearman Queue Status;CRITICAL;HARD;3;CHECK_GEARMAN CRITICAL - less than 1 workers were found having function 'worker_nagprod02.cellnet.com' registered.
--------------------------------------------------------------------------------
December 22, 2013 00:00
--------------------------------------------------------------------------------
[2013-12-22 00:00:00] Caught SIGSEGV, shutting down...
After the Caught SIGSEGV at 00:00:00, Nagios fails to start until action is taken to restart it.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: 00Caught SIGSEGV, shutting down...
This is a well know issue with livestatus, and appear to happen when they upgrade to livestatus 1.2.2p2+
https://www.google.com/search?q=livestatus+SIGSEGV
see also
https://www.mail-archive.com/checkmk-en ... 09336.html
https://www.google.com/search?q=livestatus+SIGSEGV
see also
https://www.mail-archive.com/checkmk-en ... 09336.html