Nagios Crash with SIGABRT or SIGSEGV

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Nagios Crash with SIGABRT or SIGSEGV

Post by Mike-sbg »

Im Using Nagios-4.2.1 on an centos 7 system with nsca and nsca-ng plugins

2-3 times a day nagios crashes with SIGABRT or SIGSEGV and the abrt-daemon creates a dump, which I can only read and understand in parts.

Does anybody know the problem and can me suggest a solution.

Thanks in Advance
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios Crash with SIGABRT or SIGSEGV

Post by dwhitfield »

For strace to be of much value, we'll need to know what is going on when it crashes. Can you post the output of tail -100 /usr/local/nagios/var/nagios.log in the thread?

If you want to do some digging on your own, many of the logs used are shown at https://assets.nagios.com/downloads/nag ... ptions.pdf

Lastly, can you PM me your abrt-daemon dump? Thanks!
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios Crash with SIGABRT or SIGSEGV

Post by Mike-sbg »

The Nagios.log (var/log/nagios/....) doesn't contain anything, which concerns the error. The Log suddenly stops an the next entry is the entries of the nagios start (when my whatchdog starts it again)

By the Way: You sent me a link to logs of NagiosXI but I'm using Nagios.

The only thing which looks strange to me is, that nagios complains about malformed commands eg.

Code: Select all

[1481007646] External command error: Malformed command
I didn't figure out, what causes this, but I don't think it has anything to do with the problem.
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios Crash with SIGABRT or SIGSEGV

Post by Mike-sbg »

Which files from the Abort-Dump do you need (The CoreCump ist very large):
Attachments
FileList.txt
(574 Bytes) Downloaded 313 times
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios Crash with SIGABRT or SIGSEGV

Post by dwhitfield »

Let's start with the commands.cfg. Also, it'd be ideal if you upgraded to 4.2.3. That way if we do find a bug we are on the latest version (although 4.2.4 should be coming out later this week).

I'd like to run the strace on whatever check is causing the crash, but I don't have much to go on at this point. At least the commands.cfg will give me some idea. I think the logs would too. The one before it "suddenly stops" would be of particular interest.

XI uses Core, so most of the log locations are the same, and if not the same at the very least are good clues.
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios Crash with SIGABRT or SIGSEGV

Post by Mike-sbg »

I've analysed the logs with special focus on the events just before the crash of nagios.

There were only entries of receiving several (different) passive checks, nothing really special.

It might be coincidence but yesterday I turned the debug level from 1 to 2 and nagios worked for 24 hours without any problems. Today I turned it back to 1, and just now we had again a SIGTERM.

The other thing which I could not debug, is that I get a

Code: Select all

External command error: Malformed command
every minute. It think it has something to do with the nsca-(receiver)-server (version 2.9.1). Is there a debug possibility?
But I don't think that it has anything to do with the crash of the whole nagios-core.

By the way: I'm on holiday until 12. Dec - so I'm sorry but I can't answer again before that date.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios Crash with SIGABRT or SIGSEGV

Post by dwhitfield »

Is there any chance you could leave it at 2 for a longer period to see if that is somehow related?

UPDATE: I doubt this is related to your error, but we just released a new version of NCSA: https://github.com/NagiosEnterprises/ns ... 9.2.tar.gz
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios Crash with SIGABRT or SIGSEGV

Post by Mike-sbg »

So back ... from holiday ;-)

I just turned the

Code: Select all

debug_verbosity = 2
And I'm just compiling the NSCA 2.9.2 Version ...

I will post the results (at least tomorrow)
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios Crash with SIGABRT or SIGSEGV

Post by Mike-sbg »

Using NSCA 2.9.2 (with same Parameters of version 2.9.1) causes many CRC32 Errors ... I reverted it to version 2.9.1

I just upgraded to Nagios Version 4.2.3 hope that this fixes the crashes ...
Mike-sbg
Posts: 37
Joined: Mon Jun 01, 2015 1:05 am

Re: Nagios Crash with SIGABRT or SIGSEGV

Post by Mike-sbg »

Nagios 4.2.3 also crashes every few hours ... :-(
Mon Dec 12 09:42:50 2016 ;NAGIOS;CORE; Nagios Restart NAGIOS CRITICAL: Could not locate a running Nagios process!
Mon Dec 12 13:46:01 2016 ;NAGIOS;CORE; Nagios Restart: NAGIOS CRITICAL: Could not locate a running Nagios process!
Mon Dec 12 15:36:01 2016 ;NAGIOS;CORE; Nagios Restart: NAGIOS CRITICAL: Could not locate a running Nagios process!
Locked