Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Nagios stopped checking alerts abruptly and stopped sending alerts.
Logs shows below error and since then until the time server was rebooted, no alerts came nor any logs.
nagios.log:
[Wed Mar 27 22:20:35 2019] SERVICE ALERT: afpres01;Ping;OK;HARD;2;PING OK - Packet loss = 0%, RTA = 2.82 ms
[Wed Mar 27 22:21:22 2019] Caught SIGSEGV, shutting down...
[Thu Mar 28 08:38:10 2019] Warning: enable_embedded_perl is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: p1_file is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: sleep_time is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: external_command_buffer_slots is deprecated and will be removed. All commands are always processed upon arrival
[Thu Mar 28 08:38:10 2019] Warning: command_check_interval is deprecated and will be removed. Commands are always handled on arrival
[Thu Mar 28 08:38:10 2019] Nagios 4.4.3 starting... (PID=7176)
As per above logs, error "Caught SIGSEGV, shutting down." came on Mar 27 22:21:22 2019 and I restarted nagios on Mar 28 08:38:10 2019. In between this time we didn't receive any alerts or logs. What is causing this issue?
SIGSEGV is an error(signal) caused by an invalid memory reference or a segmentation fault.
The most common cause of this would be if the server ran out of memory. How much memory does this server have? Is it running any other applications/services other than Nagios?
>> The most common cause of this would be if the server ran out of memory.
I checked the memory usage just before issue happened and it seems all smooth and no peaks at all. Average usage around 300 to 400MB
>> How much memory does this server have?
4 GB RAM
>> Is it running any other applications/services other than Nagios?
No other apps, dedicated server for nagios
I made the changes. However, I have to wait for another occurrence, which might take days or months. I will monitor this and will update this thread.
This issue is not a new one because I found the below script as a work around to check the Nagios logs for SIGSEGV error and restart whenever required.
Hope it will be helpful for those facing this issue till it is resolved by Nagios team.
The issue occurred again yesterday. "check_for_updates=0" is still set as is, which didn't solve the issue. Any solution for this?
The script which I placed saved me this time. It started the Nagios at abrupt shutdown.
I thought the update check was unlikely to cause this.
If I had to guess it is likely a plugin that is leaking memory but which plugin it is, is going to be hard to track down.
It might be helpful if your script that is catching the restart could capture a ps aux however my guess would be that the offending plugin would already be killed before your script would see it