Page 1 of 2

Caught SIGSEGV, shutting down

Posted: Sun Mar 31, 2019 6:05 am
by hemak88
Nagios stopped checking alerts abruptly and stopped sending alerts.
Logs shows below error and since then until the time server was rebooted, no alerts came nor any logs.
nagios.log:

Code: Select all

[Wed Mar 27 22:20:35 2019] SERVICE ALERT: afpres01;Ping;OK;HARD;2;PING OK - Packet loss = 0%, RTA = 2.82 ms
[Wed Mar 27 22:21:22 2019] Caught SIGSEGV, shutting down...
[Thu Mar 28 08:38:10 2019] Warning: enable_embedded_perl is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: p1_file is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: sleep_time is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: external_command_buffer_slots is deprecated and will be removed. All commands are always processed upon arrival
[Thu Mar 28 08:38:10 2019] Warning: command_check_interval is deprecated and will be removed. Commands are always handled on arrival
[Thu Mar 28 08:38:10 2019] Nagios 4.4.3 starting... (PID=7176)
As per above logs, error "Caught SIGSEGV, shutting down." came on Mar 27 22:21:22 2019 and I restarted nagios on Mar 28 08:38:10 2019. In between this time we didn't receive any alerts or logs. What is causing this issue?

Re: Caught SIGSEGV, shutting down

Posted: Mon Apr 01, 2019 8:43 am
by scottwilkerson
SIGSEGV is an error(signal) caused by an invalid memory reference or a segmentation fault.

The most common cause of this would be if the server ran out of memory. How much memory does this server have? Is it running any other applications/services other than Nagios?

Re: Caught SIGSEGV, shutting down

Posted: Tue Apr 02, 2019 12:59 am
by hemak88
>> The most common cause of this would be if the server ran out of memory.
I checked the memory usage just before issue happened and it seems all smooth and no peaks at all. Average usage around 300 to 400MB

>> How much memory does this server have?
4 GB RAM

>> Is it running any other applications/services other than Nagios?
No other apps, dedicated server for nagios

Re: Caught SIGSEGV, shutting down

Posted: Tue Apr 02, 2019 6:58 am
by scottwilkerson
hemak88 wrote:I checked the memory usage just before issue happened and it seems all smooth and no peaks at all. Average usage around 300 to 400MB
this could happen very rapidly if there was a script/plugin that had a loop that was consuming memory fast.

I would recommend that your reboot the server in case the oom-killer was invoked and killed off other processes that may not be apparent right away.

Re: Caught SIGSEGV, shutting down

Posted: Wed May 29, 2019 11:46 pm
by hemak88
This error occurred again without a reason. Once again no memory constraints or errors in logs. Just the SIGSEGV error in nagios.log file

Code: Select all

[1559137037] Caught SIGSEGV, shutting down...
I found this release notes with SIGSEV error resolved. We are on Nagios 4.4.3 version
https://raw.githubusercontent.com/Nagio ... /Changelog

Additionally this link below. Is it a solution I can try?
https://lists.icinga.org/pipermail/icin ... 05434.html

Re: Caught SIGSEGV, shutting down

Posted: Thu May 30, 2019 6:42 am
by scottwilkerson
I've never heard of this but you can try, set the following in your nagios.cfg

Code: Select all

check_for_updates=0
then restart nagios

Re: Caught SIGSEGV, shutting down

Posted: Wed Jun 12, 2019 2:25 am
by hemak88
I made the changes. However, I have to wait for another occurrence, which might take days or months. I will monitor this and will update this thread.
This issue is not a new one because I found the below script as a work around to check the Nagios logs for SIGSEGV error and restart whenever required.
Hope it will be helpful for those facing this issue till it is resolved by Nagios team.

Code: Select all

#!/bin/bash

######################################  VARIABLES ############################################################
NAGIOS_LOG=`cat /usr/local/nagios/var/nagios.log | perl -pe 's/(\d+)/localtime($1)/e' | grep Caught | awk '{print $2" "$3" "$4" "$6" "$7" "$8" "$9$10}' > /usr/local/nagios/var/tmp_log`

NAGIOS_LOG_COUNT=`awk -v d1="$(date --date="-60 min" "+%b %_d %H:%M")" -v d2="$(date "+%b %_d %H:%M")" '$0 > d1 && $0 < d2 || $0 ~ d2' /usr/local/nagios/var/tmp_log | wc -l`

SERVICE_NAG_COUNT=`/etc/init.d/nagios status | grep running | wc -l`
####################################### DEC END ##############################################################

if [ $NAGIOS_LOG_COUNT == 0 ];

then

echo "Nagios is running OK"

elif [ $NAGIOS_LOG_COUNT -ge 1 ];

then

echo "Nagios Service Outage" >> /usr/local/nagios/var/nagios_service_check_log

echo "=====================" >> /usr/local/nagios/var/nagios_service_check_log

echo "$NAGIOS_LOG" >> /usr/local/nagios/var/nagios_service_check_log

echo "## Restarting Nagios Service ##" >> /usr/local/nagios/var/nagios_service_check_log

/etc/init.d/nagios restart >> /usr/local/nagios/var/nagios_service_check_log

sleep 2

if [ $SERVICE_NAG_COUNT == 1 ];

then

############# VARIABLE ###############################
SERVICE_NAG=`/etc/init.d/nagios status | grep running`
######################################################

echo "OK - $SERVICE_NAG" >> /usr/local/nagios/var/nagios_service_check_log | mail -s "NOTIFICATION - Nagios Service Outage" [email protected] < /usr/local/nagios/var/nagios_service_check_log && rm -rf /usr/local/nagios/var/nagios_service_check_log

else

echo "CRITICAL - Nagios Service restart failed" >> /usr/local/nagios/var/nagios_service_check_log | mail -s "CRITICAL - Nagios Service Outage - Escalation Needed" [email protected] < /usr/local/nagios/var/nagios_service_check_log && rm -rf /usr/local/nagios/var/nagios_service_check_log

fi
fi
Refer:
https://www.howtovmlinux.com/articles/m ... start.html

Re: Caught SIGSEGV, shutting down

Posted: Wed Jun 12, 2019 6:38 am
by scottwilkerson
Thanks for sharing!

Re: Caught SIGSEGV, shutting down

Posted: Sun Jul 21, 2019 2:49 am
by hemak88
The issue occurred again yesterday. "check_for_updates=0" is still set as is, which didn't solve the issue. Any solution for this?
The script which I placed saved me this time. It started the Nagios at abrupt shutdown.

Re: Caught SIGSEGV, shutting down

Posted: Mon Jul 22, 2019 3:09 pm
by scottwilkerson
I thought the update check was unlikely to cause this.

If I had to guess it is likely a plugin that is leaking memory but which plugin it is, is going to be hard to track down.

It might be helpful if your script that is catching the restart could capture a ps aux however my guess would be that the offending plugin would already be killed before your script would see it