Nagios Crash with SIGABRT or SIGSEGV
Nagios Crash with SIGABRT or SIGSEGV
Im Using Nagios-4.2.1 on an centos 7 system with nsca and nsca-ng plugins
2-3 times a day nagios crashes with SIGABRT or SIGSEGV and the abrt-daemon creates a dump, which I can only read and understand in parts.
Does anybody know the problem and can me suggest a solution.
Thanks in Advance
2-3 times a day nagios crashes with SIGABRT or SIGSEGV and the abrt-daemon creates a dump, which I can only read and understand in parts.
Does anybody know the problem and can me suggest a solution.
Thanks in Advance
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Nagios Crash with SIGABRT or SIGSEGV
For strace to be of much value, we'll need to know what is going on when it crashes. Can you post the output of tail -100 /usr/local/nagios/var/nagios.log in the thread?
If you want to do some digging on your own, many of the logs used are shown at https://assets.nagios.com/downloads/nag ... ptions.pdf
Lastly, can you PM me your abrt-daemon dump? Thanks!
If you want to do some digging on your own, many of the logs used are shown at https://assets.nagios.com/downloads/nag ... ptions.pdf
Lastly, can you PM me your abrt-daemon dump? Thanks!
Re: Nagios Crash with SIGABRT or SIGSEGV
The Nagios.log (var/log/nagios/....) doesn't contain anything, which concerns the error. The Log suddenly stops an the next entry is the entries of the nagios start (when my whatchdog starts it again)
By the Way: You sent me a link to logs of NagiosXI but I'm using Nagios.
The only thing which looks strange to me is, that nagios complains about malformed commands eg.
I didn't figure out, what causes this, but I don't think it has anything to do with the problem.
By the Way: You sent me a link to logs of NagiosXI but I'm using Nagios.
The only thing which looks strange to me is, that nagios complains about malformed commands eg.
Code: Select all
[1481007646] External command error: Malformed command
Re: Nagios Crash with SIGABRT or SIGSEGV
Which files from the Abort-Dump do you need (The CoreCump ist very large):
- Attachments
-
- FileList.txt
- (574 Bytes) Downloaded 313 times
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Nagios Crash with SIGABRT or SIGSEGV
Let's start with the commands.cfg. Also, it'd be ideal if you upgraded to 4.2.3. That way if we do find a bug we are on the latest version (although 4.2.4 should be coming out later this week).
I'd like to run the strace on whatever check is causing the crash, but I don't have much to go on at this point. At least the commands.cfg will give me some idea. I think the logs would too. The one before it "suddenly stops" would be of particular interest.
XI uses Core, so most of the log locations are the same, and if not the same at the very least are good clues.
I'd like to run the strace on whatever check is causing the crash, but I don't have much to go on at this point. At least the commands.cfg will give me some idea. I think the logs would too. The one before it "suddenly stops" would be of particular interest.
XI uses Core, so most of the log locations are the same, and if not the same at the very least are good clues.
Re: Nagios Crash with SIGABRT or SIGSEGV
I've analysed the logs with special focus on the events just before the crash of nagios.
There were only entries of receiving several (different) passive checks, nothing really special.
It might be coincidence but yesterday I turned the debug level from 1 to 2 and nagios worked for 24 hours without any problems. Today I turned it back to 1, and just now we had again a SIGTERM.
The other thing which I could not debug, is that I get a
every minute. It think it has something to do with the nsca-(receiver)-server (version 2.9.1). Is there a debug possibility?
But I don't think that it has anything to do with the crash of the whole nagios-core.
By the way: I'm on holiday until 12. Dec - so I'm sorry but I can't answer again before that date.
There were only entries of receiving several (different) passive checks, nothing really special.
It might be coincidence but yesterday I turned the debug level from 1 to 2 and nagios worked for 24 hours without any problems. Today I turned it back to 1, and just now we had again a SIGTERM.
The other thing which I could not debug, is that I get a
Code: Select all
External command error: Malformed command
But I don't think that it has anything to do with the crash of the whole nagios-core.
By the way: I'm on holiday until 12. Dec - so I'm sorry but I can't answer again before that date.
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Nagios Crash with SIGABRT or SIGSEGV
Is there any chance you could leave it at 2 for a longer period to see if that is somehow related?
UPDATE: I doubt this is related to your error, but we just released a new version of NCSA: https://github.com/NagiosEnterprises/ns ... 9.2.tar.gz
UPDATE: I doubt this is related to your error, but we just released a new version of NCSA: https://github.com/NagiosEnterprises/ns ... 9.2.tar.gz
Re: Nagios Crash with SIGABRT or SIGSEGV
So back ... from holiday
I just turned the
And I'm just compiling the NSCA 2.9.2 Version ...
I will post the results (at least tomorrow)
I just turned the
Code: Select all
debug_verbosity = 2
I will post the results (at least tomorrow)
Re: Nagios Crash with SIGABRT or SIGSEGV
Using NSCA 2.9.2 (with same Parameters of version 2.9.1) causes many CRC32 Errors ... I reverted it to version 2.9.1
I just upgraded to Nagios Version 4.2.3 hope that this fixes the crashes ...
I just upgraded to Nagios Version 4.2.3 hope that this fixes the crashes ...
Re: Nagios Crash with SIGABRT or SIGSEGV
Nagios 4.2.3 also crashes every few hours ...
Mon Dec 12 09:42:50 2016 ;NAGIOS;CORE; Nagios Restart NAGIOS CRITICAL: Could not locate a running Nagios process!
Mon Dec 12 13:46:01 2016 ;NAGIOS;CORE; Nagios Restart: NAGIOS CRITICAL: Could not locate a running Nagios process!
Mon Dec 12 15:36:01 2016 ;NAGIOS;CORE; Nagios Restart: NAGIOS CRITICAL: Could not locate a running Nagios process!