Nagios 4.3.1 crashes when using mod_gearman

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Nagios 4.3.1 crashes when using mod_gearman

Postby danjoh » Wed Mar 15, 2017 3:04 am

I have upgraded Nagios from 4.2.4 to 4.3.1 (luckily only on my development box) and now it crashes with a SIGSEGV / SIGTERM repeatedly (about once a minute).
For me it looks like a problem when a broker_module sends data "back" to nagios.

I base this on the following facts.

If I disable mod_gearman in nagios.cfg, everything works OK.
If I enable mod_gearman in nagios.cfg, but do not use it for host-/service-checks, everything works OK.
If I enable mod_gearman and use it for host-/service-checks it starts crashing.

Sadly, the only thing I can see in the nagios-log are:
Caught SIGSEGV, shutting down...
Caught SIGTERM, shutting down...

In the debug-log I do not see anything strange.
Here are my SW releases:
OS: RHEL 7.3
Nagios-Core 4.3.1 (build from source)
mod_gearman 3.0.1-1 (labs.consol.de)
gearmand 0.33-5 (labs.consol.de)

Running nagios under gdb I see the following when it crashes:

Code: Select all
Program received signal SIGSEGV, Segmentation fault.
clear_custom_vars (vars=vars@entry=0x7ffffffed940) at ../common/macros.c:2851
2851 my_free(this_customvariablesmember->variable_name);
Missing separate debuginfos, use: debuginfo-install boost-system-1.53.0-26.el7.x86_64 gearmand-0.33-5.x86_64 glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 sssd-client-1.14.0-43.el7_3.11.x86_64
(gdb) bt
#0 clear_custom_vars (vars=vars@entry=0x7ffffffed940) at ../common/macros.c:2851
#1 0x00005555555916bc in clear_contact_macros_r (mac=mac@entry=0x7ffffffed2e0) at ../common/macros.c:3001
#2 0x00005555555918b7 in clear_volatile_macros_r (mac=mac@entry=0x7ffffffed2e0) at ../common/macros.c:2870
#3 0x00007ffff64aaa9e in handle_svc_check (event_type=, data=0x7fffffffda30) at neb_module_nagios4/../neb_module/mod_gearman.c:851
#4 0x000055555556bb2f in neb_make_callbacks (callback_type=callback_type@entry=6, data=data@entry=0x7fffffffda30) at nebmods.c:529
#5 0x0000555555569f10 in broker_service_check (type=type@entry=704, flags=flags@entry=0, attr=attr@entry=0, svc=svc@entry=0x555555e97310, check_type=check_type@entry=0,
start_time=..., end_time=..., cmd=, latency=0, exectime=exectime@entry=0, timeout=timeout@entry=0, early_timeout=early_timeout@entry=0,
retcode=retcode@entry=0, cmdline=cmdline@entry=0x0, timestamp=timestamp@entry=0x0, cr=cr@entry=0x0) at broker.c:326
#6 0x000055555557172f in run_async_service_check (svc=svc@entry=0x555555e97310, check_options=check_options@entry=0, latency=latency@entry=0.0008800000068731606,
scheduled_check=scheduled_check@entry=1, reschedule_check=reschedule_check@entry=1, time_is_valid=time_is_valid@entry=0x7fffffffe29c,
preferred_time=preferred_time@entry=0x7fffffffe2a8) at checks.c:199
#7 0x0000555555571cb1 in run_scheduled_service_check (svc=svc@entry=0x555555e97310, check_options=0, latency=latency@entry=0.0008800000068731606) at checks.c:90
#8 0x0000555555587adb in handle_timed_event (event=event@entry=0x555555e8fc20) at events.c:1171
#9 0x0000555555588623 in event_execution_loop () at events.c:1110
#10 0x0000555555568a56 in main (argc=, argv=) at nagios.c:814


I have opened an issue @github for mod_gearman (https://github.com/sni/mod_gearman/issues/110) and what they say is:
This looks like it may be a Core bug. I was able to replicate with pre-built and compiled from source ModGearman modules.


Has anything changed with the NEB interface between 4.2.4 and 4.3.1?
Any suggestions on what could be wrong (I know you do not "support" 3:d party modules, but let us not start finger pointing, please).
If you need more debugging info, I would be glad to help.

Regards,
--
D/\N
danjoh
 
Posts: 31
Joined: Mon Dec 07, 2015 10:43 am
Location: Zürich, Switzerland

Re: Nagios 4.3.1 crashes when using mod_gearman

Postby tmcdonald » Wed Mar 15, 2017 12:24 pm

For reference, the gentleman you quoted does in fact work for Nagios so I'll speak with him directly about this and get back to you when I hear more.

Update: Our Core dev is aware of the issue and they're working on getting it resolved.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tmcdonald
Support Lead / Operations Engineer
 
Posts: 8683
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios 4.3.1 crashes when using mod_gearman

Postby bheden » Wed Mar 15, 2017 1:26 pm

Based on my initial investigation, it isn't anything NEB specific. It looks like some change in the macro freeing code has occurred causing this issue. The Nagios Core developer is aware of the bug, and I promise I'll post an update to that ModGearman issue whenever he lets me know. ;)
Nagios Enterprises
Product Development Manager
User avatar
bheden
Product Development Manager
 
Posts: 136
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: Nagios 4.3.1 crashes when using mod_gearman

Postby danjoh » Thu Mar 16, 2017 2:13 am

Thanks for the update/feedback.

Keep up the great work.
--
D/\N
danjoh
 
Posts: 31
Joined: Mon Dec 07, 2015 10:43 am
Location: Zürich, Switzerland

Re: Nagios 4.3.1 crashes when using mod_gearman

Postby dwhitfield » Thu Mar 16, 2017 3:27 pm

Just spoke to the dev about this and he suggested checking back next week. Probably Thursday is a good day next week because Friday is a short day for support.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
dwhitfield
The Doctor
 
Posts: 3756
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

Re: Nagios 4.3.1 crashes when using mod_gearman

Postby superman » Wed Mar 22, 2017 4:10 am

Hi danjoh,

Any reason you upgrade from 4.2.4 to 4.3.1? is that any major bugs or problem in 4.2.4? as i read from 4.3.1 version history, seen there are major fixes.
fyi, we are planning to upgrade to 4.2.4 and our major concern are the stability.


4.3.1 – 02/23/2017

FIXES
Backed out the changes related to the “login page” and “logoff link” from 4.3.0, since it was causing problems
Service hard state generation and host hard or soft down status
Comments are duplicated through Nagios reload
host hourly value is incorrectly dumped as json boolean
Bug – Quick Search no longer allows search by IP
Config: status_update_interval can not be set to 1
Check attempts not increasing if nagios is reloaded
nagios hangs on reload while sending external command to cmd file
Feature Request: return code xxx out of bounds – include message as well
superman
 
Posts: 2
Joined: Mon Mar 14, 2016 10:16 pm

Re: Nagios 4.3.1 crashes when using mod_gearman

Postby danjoh » Wed Mar 22, 2017 10:05 am

Hi Superman,

The main reason to update where (among other things):

* Bug - Quick Search no longer allows search by IP (4.3.1)
* Fix for CVE-2016-6209 - The "corewindow" parameter (4.3.0)
* Added a login page, and a `Logoff` links (4.3.0)
* On the status map, the host name will be colored if services are not all OK. (4.3.0)
* Added "Clear flapping state" command on host and services detail pages (4.3.0)

And of-cause to test the latest and greatest ;-)
As long as mod_gearman integration (or remote workers) does not work, I will stay on 4.2.4 on out production host.
--
D/\N
danjoh
 
Posts: 31
Joined: Mon Dec 07, 2015 10:43 am
Location: Zürich, Switzerland

Re: Nagios 4.3.1 crashes when using mod_gearman

Postby dwhitfield » Wed Mar 22, 2017 10:39 am

@superman, we had a lot of problems with the new features in 4.3.0, but 4.3.1 has been really good so far (released just two days after 4.3.0). This mod_gearman issue is the only one I know about in 4.3.1, although I'm sure there are still some open bug reports on the github.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
dwhitfield
The Doctor
 
Posts: 3756
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

Re: Nagios 4.3.1 crashes when using mod_gearman

Postby superman » Fri Mar 24, 2017 1:38 am

Thanks dwhitfield & danjoh.

We will stay with current version till the Mod Gearman issue solve as it is very useful for distributed check.
superman
 
Posts: 2
Joined: Mon Mar 14, 2016 10:16 pm

Re: Nagios 4.3.1 crashes when using mod_gearman

Postby mcapra » Fri Mar 24, 2017 12:59 pm

@danjoh did you have additional questions regarding this? Otherwise, we'll touch base once the issue has been more thoroughly investigated from the Nagios Core side of things.
Former Nagios employee
http://www.mcapra.com/
User avatar
mcapra
 
Posts: 2773
Joined: Thu May 05, 2016 3:54 pm

Next

Return to Nagios Core

Who is online

Users browsing this forum: dallies and 12 guests