Caught SIGSEGV, shutting down

hemak88 · Post by **hemak88** » Sun Mar 31, 2019 6:05 am

Nagios stopped checking alerts abruptly and stopped sending alerts.
Logs shows below error and since then until the time server was rebooted, no alerts came nor any logs.
nagios.log:

Code: Select all

[Wed Mar 27 22:20:35 2019] SERVICE ALERT: afpres01;Ping;OK;HARD;2;PING OK - Packet loss = 0%, RTA = 2.82 ms
[Wed Mar 27 22:21:22 2019] Caught SIGSEGV, shutting down...
[Thu Mar 28 08:38:10 2019] Warning: enable_embedded_perl is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: p1_file is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: sleep_time is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: external_command_buffer_slots is deprecated and will be removed. All commands are always processed upon arrival
[Thu Mar 28 08:38:10 2019] Warning: command_check_interval is deprecated and will be removed. Commands are always handled on arrival
[Thu Mar 28 08:38:10 2019] Nagios 4.4.3 starting... (PID=7176)

As per above logs, error "Caught SIGSEGV, shutting down." came on Mar 27 22:21:22 2019 and I restarted nagios on Mar 28 08:38:10 2019. In between this time we didn't receive any alerts or logs. What is causing this issue?

scottwilkerson · Post by **scottwilkerson** » Mon Apr 01, 2019 8:43 am

SIGSEGV is an error(signal) caused by an invalid memory reference or a segmentation fault.

The most common cause of this would be if the server ran out of memory. How much memory does this server have? Is it running any other applications/services other than Nagios?

hemak88 · Post by **hemak88** » Tue Apr 02, 2019 12:59 am

>> The most common cause of this would be if the server ran out of memory.
I checked the memory usage just before issue happened and it seems all smooth and no peaks at all. Average usage around 300 to 400MB

>> How much memory does this server have?
4 GB RAM

>> Is it running any other applications/services other than Nagios?
No other apps, dedicated server for nagios

scottwilkerson · Post by **scottwilkerson** » Tue Apr 02, 2019 6:58 am

hemak88 wrote:I checked the memory usage just before issue happened and it seems all smooth and no peaks at all. Average usage around 300 to 400MB

this could happen very rapidly if there was a script/plugin that had a loop that was consuming memory fast.

I would recommend that your reboot the server in case the oom-killer was invoked and killed off other processes that may not be apparent right away.

hemak88 · Post by **hemak88** » Wed May 29, 2019 11:46 pm

This error occurred again without a reason. Once again no memory constraints or errors in logs. Just the SIGSEGV error in nagios.log file

Code: Select all

[1559137037] Caught SIGSEGV, shutting down...

I found this release notes with SIGSEV error resolved. We are on Nagios 4.4.3 version
https://raw.githubusercontent.com/Nagio ... /Changelog

Additionally this link below. Is it a solution I can try?
https://lists.icinga.org/pipermail/icin ... 05434.html

scottwilkerson · Post by **scottwilkerson** » Thu May 30, 2019 6:42 am

I've never heard of this but you can try, set the following in your nagios.cfg

Code: Select all

check_for_updates=0

then restart nagios

hemak88 · Post by **hemak88** » Wed Jun 12, 2019 2:25 am

I made the changes. However, I have to wait for another occurrence, which might take days or months. I will monitor this and will update this thread.
This issue is not a new one because I found the below script as a work around to check the Nagios logs for SIGSEGV error and restart whenever required.
Hope it will be helpful for those facing this issue till it is resolved by Nagios team.

Code: Select all

#!/bin/bash

######################################  VARIABLES ############################################################
NAGIOS_LOG=`cat /usr/local/nagios/var/nagios.log | perl -pe 's/(\d+)/localtime($1)/e' | grep Caught | awk '{print $2" "$3" "$4" "$6" "$7" "$8" "$9$10}' > /usr/local/nagios/var/tmp_log`

NAGIOS_LOG_COUNT=`awk -v d1="$(date --date="-60 min" "+%b %_d %H:%M")" -v d2="$(date "+%b %_d %H:%M")" '$0 > d1 && $0 < d2 || $0 ~ d2' /usr/local/nagios/var/tmp_log | wc -l`

SERVICE_NAG_COUNT=`/etc/init.d/nagios status | grep running | wc -l`
####################################### DEC END ##############################################################

if [ $NAGIOS_LOG_COUNT == 0 ];

then

echo "Nagios is running OK"

elif [ $NAGIOS_LOG_COUNT -ge 1 ];

then

echo "Nagios Service Outage" >> /usr/local/nagios/var/nagios_service_check_log

echo "=====================" >> /usr/local/nagios/var/nagios_service_check_log

echo "$NAGIOS_LOG" >> /usr/local/nagios/var/nagios_service_check_log

echo "## Restarting Nagios Service ##" >> /usr/local/nagios/var/nagios_service_check_log

/etc/init.d/nagios restart >> /usr/local/nagios/var/nagios_service_check_log

sleep 2

if [ $SERVICE_NAG_COUNT == 1 ];

then

############# VARIABLE ###############################
SERVICE_NAG=`/etc/init.d/nagios status | grep running`
######################################################

echo "OK - $SERVICE_NAG" >> /usr/local/nagios/var/nagios_service_check_log | mail -s "NOTIFICATION - Nagios Service Outage" toaddress@domain.com < /usr/local/nagios/var/nagios_service_check_log && rm -rf /usr/local/nagios/var/nagios_service_check_log

else

echo "CRITICAL - Nagios Service restart failed" >> /usr/local/nagios/var/nagios_service_check_log | mail -s "CRITICAL - Nagios Service Outage - Escalation Needed" toaddress@domain.com < /usr/local/nagios/var/nagios_service_check_log && rm -rf /usr/local/nagios/var/nagios_service_check_log

fi
fi

Refer:
https://www.howtovmlinux.com/articles/m ... start.html

scottwilkerson · Post by **scottwilkerson** » Wed Jun 12, 2019 6:38 am

Thanks for sharing!

hemak88 · Post by **hemak88** » Sun Jul 21, 2019 2:49 am

The issue occurred again yesterday. "check_for_updates=0" is still set as is, which didn't solve the issue. Any solution for this?
The script which I placed saved me this time. It started the Nagios at abrupt shutdown.

scottwilkerson · Post by **scottwilkerson** » Mon Jul 22, 2019 3:09 pm

I thought the update check was unlikely to cause this.

If I had to guess it is likely a plugin that is leaking memory but which plugin it is, is going to be hard to track down.

It might be helpful if your script that is catching the restart could capture a ps aux however my guess would be that the offending plugin would already be killed before your script would see it

Nagios Support Forum

Caught SIGSEGV, shutting down

Caught SIGSEGV, shutting down

Re: Caught SIGSEGV, shutting down

Re: Caught SIGSEGV, shutting down

Re: Caught SIGSEGV, shutting down

Re: Caught SIGSEGV, shutting down

Re: Caught SIGSEGV, shutting down

Re: Caught SIGSEGV, shutting down

Re: Caught SIGSEGV, shutting down

Re: Caught SIGSEGV, shutting down

Re: Caught SIGSEGV, shutting down