00Caught SIGSEGV, shutting down...

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
Mitchell
Posts: 130
Joined: Thu Jan 05, 2012 2:33 am

Re: 00Caught SIGSEGV, shutting down...

Post by Mitchell »

okay. I will need to schedule the upgrades. I have livestatus 1.2.2p2 on all servers with XI 2.5 so little skeptical.
broker_module=/usr/lib/check_mk/livestatus.o /usr/local/nagios/var/rw/live idle_timeout=60000 num_client_threads=40
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: 00Caught SIGSEGV, shutting down...

Post by abrist »

Yeah, the upgrades would definitely be suggested. What was the reason for the optional broker arguments?
Mitchell wrote:broker_module=/usr/lib/check_mk/livestatus.o /usr/local/nagios/var/rw/live idle_timeout=60000 num_client_threads=40
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: 00Caught SIGSEGV, shutting down...

Post by mrochelle »

We have been experiencing the same problem after updating to the Nagios XI 2012R2.5 release. It happens every 3 or 4 days right at midnight. We do not have anything scheduled for that time. Following is the errors in the messages log.
Dec 19 00:00:00 nagprod01 nagios: Caught SIGSEGV, shutting down...
Dec 19 00:00:00 nagprod01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Dec 19 00:00:00 nagprod01 ndo2db: mysql_error: 'MySQL server has gone away'
Dec 19 00:00:00 nagprod01 ndo2db: Error: Connection to MySQL database has been lost!


Of course we have to restart Nagios to recover.
I'm going to update today to Nagios XI 2012R2.7 and proabably update the livestatus broker to 1.2.2p3 from 1.2.2p2 also.

My nagios.cfg entry:
broker_module=/usr/local/lib/mk-livestatus/livestatus.o /usr/local/nagios/var/rw/live
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: 00Caught SIGSEGV, shutting down...

Post by abrist »

mrochelle wrote:We have been experiencing the same problem after updating to the Nagios XI 2012R2.5 release. It happens every 3 or 4 days right at midnight. We do not have anything scheduled for that time. Following is the errors in the messages log.
Dec 19 00:00:00 nagprod01 nagios: Caught SIGSEGV, shutting down...
Dec 19 00:00:00 nagprod01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Dec 19 00:00:00 nagprod01 ndo2db: mysql_error: 'MySQL server has gone away'
Dec 19 00:00:00 nagprod01 ndo2db: Error: Connection to MySQL database has been lost!
These errors do not look like livestatus, but with your mysql server experiencing issues. Let us know how the upgrade goes and the state of this aberrant behavior afterwards.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
Mitchell
Posts: 130
Joined: Thu Jan 05, 2012 2:33 am

Re: 00Caught SIGSEGV, shutting down...

Post by Mitchell »

abrist wrote:Yeah, the upgrades would definitely be suggested. What was the reason for the optional broker arguments?
Mitchell wrote:broker_module=/usr/lib/check_mk/livestatus.o /usr/local/nagios/var/rw/live idle_timeout=60000 num_client_threads=40
We have multiple sites and use check_mk multisite (multi location with GTM). The optional arguments help with a good balance for enough threads with not very long idle timeout.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: 00Caught SIGSEGV, shutting down...

Post by slansing »

Did you get the update done successfully?
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: 00Caught SIGSEGV, shutting down...

Post by mrochelle »

Yes, the update to Nagios XI 2012R2.7 was installed this afternoon along with mk-livestatus-1.2.2p2-3.x86_64.rpm package. If we go a week or two without the problem, I would consider it a fix. :geek:
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: 00Caught SIGSEGV, shutting down...

Post by tmcdonald »

Alright, we'll leave this open for a bit longer until we hear from Mitchell.
Former Nagios employee
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: 00Caught SIGSEGV, shutting down...

Post by mrochelle »

While we are waiting for Mitchell, after updating to the latest version of NagiosXI2012R2.7, and livestatus 1.2.2p2-3. The Caught SIGSEGV, shutting down occurred this morning at midnight. Following are the event and message logs around that time.
Event Log
[2013-12-22 02:13:28] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[2013-12-22 02:13:28] ndomod: Successfully connected to data sink. 0 queued items to flush.
[2013-12-22 02:13:28] ndomod: NDOMOD 1.5.2 (06-08-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[2013-12-22 02:13:28] Event broker module '/usr/lib64/mod_gearman/mod_gearman.o' initialized successfully.
[2013-12-22 02:13:28] mod_gearman: initialized version 1.3.8 (libgearman 0.25)
[2013-12-22 02:13:28] Event broker module '/usr/local/lib/mk-livestatus/livestatus.o' initialized successfully.
[2013-12-22 02:13:28] livestatus: Finished initialization. Further log messages go to /usr/local/nagios/var/livestatus.log
[2013-12-22 02:13:28] livestatus: archive path /usr/local/nagios/var/archives
[2013-12-22 02:13:28] livestatus: Removed old left over socket file /usr/local/nagios/var/rw/live
[2013-12-22 02:13:28] livestatus: Please visit OMD at http://omdistro.org
[2013-12-22 02:13:28] livestatus: Hint: please try out OMD - the Open Monitoring Distribution
[2013-12-22 02:13:28] livestatus: Please visit us at http://mathias-kettner.de/
[2013-12-22 02:13:28] livestatus: Livestatus 1.2.2p2 by Mathias Kettner. Socket: '/usr/local/nagios/var/rw/live'
[2013-12-22 02:13:28] LOG VERSION: 2.0
[2013-12-22 02:13:28] Local time is Sun Dec 22 02:13:28 CST 2013
[2013-12-22 02:13:28] Nagios 3.5.0 starting... (PID=20731)
[2013-12-22 02:08:57] Auto-save of retention data completed successfully.
[2013-12-22 02:08:17] SERVICE ALERT: Nagios_Master_148.80.140.92;Gearman Queue Status;CRITICAL;HARD;3;CHECK_GEARMAN CRITICAL - less than 1 workers were found having function 'worker_nagprod02.cellnet.com' registered.
[2013-12-22 02:01:00] TIMEPERIOD TRANSITION: Daily 0001;1;0
[2013-12-22 02:00:00] TIMEPERIOD TRANSITION: Daily 0200;0;1
[2013-12-22 02:00:00] TIMEPERIOD TRANSITION: 0200-2200;0;1


--------------------------------------------------------------------------------
December 22, 2013 01:00
--------------------------------------------------------------------------------


[2013-12-22 01:45:00] TIMEPERIOD TRANSITION: 0145-0045;0;1
[2013-12-22 01:08:57] Auto-save of retention data completed successfully.


--------------------------------------------------------------------------------
December 22, 2013 00:00
--------------------------------------------------------------------------------


[2013-12-22 00:46:00] TIMEPERIOD TRANSITION: 0145-0045;1;0
[2013-12-22 00:45:00] TIMEPERIOD TRANSITION: 0045-2345;0;1
[2013-12-22 00:30:00] TIMEPERIOD TRANSITION: Daily 0030;0;1
[2013-12-22 00:15:00] TIMEPERIOD TRANSITION: Daily 0015;0;1
[2013-12-22 00:08:57] Auto-save of retention data completed successfully.
[2013-12-22 00:01:00] TIMEPERIOD TRANSITION: Daily 0001;0;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: notification_times;1;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: xi_timeperiod_none;0;0
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: xi_timeperiod_24x7;1;1
[2013-12-22 00:00:00] TIMEPERIOD TRANSITION: workhours;0;0


ALERT LOG
[2013-12-22 02:13:28] Nagios 3.5.0 starting... (PID=20731)
[2013-12-22 02:08:17] SERVICE ALERT: Nagios_Master_148.80.140.92;Gearman Queue Status;CRITICAL;HARD;3;CHECK_GEARMAN CRITICAL - less than 1 workers were found having function 'worker_nagprod02.cellnet.com' registered.

--------------------------------------------------------------------------------
December 22, 2013 00:00
--------------------------------------------------------------------------------

[2013-12-22 00:00:00] Caught SIGSEGV, shutting down...

After the Caught SIGSEGV at 00:00:00, Nagios fails to start until action is taken to restart it.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: 00Caught SIGSEGV, shutting down...

Post by scottwilkerson »

This is a well know issue with livestatus, and appear to happen when they upgrade to livestatus 1.2.2p2+
https://www.google.com/search?q=livestatus+SIGSEGV

see also
https://www.mail-archive.com/checkmk-en ... 09336.html
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked