Caught SIGSEGV, shutting down

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
hemak88
Posts: 22
Joined: Wed Mar 29, 2017 2:31 am

Caught SIGSEGV, shutting down

Post by hemak88 »

Nagios stopped checking alerts abruptly and stopped sending alerts.
Logs shows below error and since then until the time server was rebooted, no alerts came nor any logs.
nagios.log:

Code: Select all

[Wed Mar 27 22:20:35 2019] SERVICE ALERT: afpres01;Ping;OK;HARD;2;PING OK - Packet loss = 0%, RTA = 2.82 ms
[Wed Mar 27 22:21:22 2019] Caught SIGSEGV, shutting down...
[Thu Mar 28 08:38:10 2019] Warning: enable_embedded_perl is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: p1_file is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: sleep_time is deprecated and will be removed.
[Thu Mar 28 08:38:10 2019] Warning: external_command_buffer_slots is deprecated and will be removed. All commands are always processed upon arrival
[Thu Mar 28 08:38:10 2019] Warning: command_check_interval is deprecated and will be removed. Commands are always handled on arrival
[Thu Mar 28 08:38:10 2019] Nagios 4.4.3 starting... (PID=7176)
As per above logs, error "Caught SIGSEGV, shutting down." came on Mar 27 22:21:22 2019 and I restarted nagios on Mar 28 08:38:10 2019. In between this time we didn't receive any alerts or logs. What is causing this issue?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Caught SIGSEGV, shutting down

Post by scottwilkerson »

SIGSEGV is an error(signal) caused by an invalid memory reference or a segmentation fault.

The most common cause of this would be if the server ran out of memory. How much memory does this server have? Is it running any other applications/services other than Nagios?
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
hemak88
Posts: 22
Joined: Wed Mar 29, 2017 2:31 am

Re: Caught SIGSEGV, shutting down

Post by hemak88 »

>> The most common cause of this would be if the server ran out of memory.
I checked the memory usage just before issue happened and it seems all smooth and no peaks at all. Average usage around 300 to 400MB

>> How much memory does this server have?
4 GB RAM

>> Is it running any other applications/services other than Nagios?
No other apps, dedicated server for nagios
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Caught SIGSEGV, shutting down

Post by scottwilkerson »

hemak88 wrote:I checked the memory usage just before issue happened and it seems all smooth and no peaks at all. Average usage around 300 to 400MB
this could happen very rapidly if there was a script/plugin that had a loop that was consuming memory fast.

I would recommend that your reboot the server in case the oom-killer was invoked and killed off other processes that may not be apparent right away.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
hemak88
Posts: 22
Joined: Wed Mar 29, 2017 2:31 am

Re: Caught SIGSEGV, shutting down

Post by hemak88 »

This error occurred again without a reason. Once again no memory constraints or errors in logs. Just the SIGSEGV error in nagios.log file

Code: Select all

[1559137037] Caught SIGSEGV, shutting down...
I found this release notes with SIGSEV error resolved. We are on Nagios 4.4.3 version
https://raw.githubusercontent.com/Nagio ... /Changelog

Additionally this link below. Is it a solution I can try?
https://lists.icinga.org/pipermail/icin ... 05434.html
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Caught SIGSEGV, shutting down

Post by scottwilkerson »

I've never heard of this but you can try, set the following in your nagios.cfg

Code: Select all

check_for_updates=0
then restart nagios
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
hemak88
Posts: 22
Joined: Wed Mar 29, 2017 2:31 am

Re: Caught SIGSEGV, shutting down

Post by hemak88 »

I made the changes. However, I have to wait for another occurrence, which might take days or months. I will monitor this and will update this thread.
This issue is not a new one because I found the below script as a work around to check the Nagios logs for SIGSEGV error and restart whenever required.
Hope it will be helpful for those facing this issue till it is resolved by Nagios team.

Code: Select all

#!/bin/bash

######################################  VARIABLES ############################################################
NAGIOS_LOG=`cat /usr/local/nagios/var/nagios.log | perl -pe 's/(\d+)/localtime($1)/e' | grep Caught | awk '{print $2" "$3" "$4" "$6" "$7" "$8" "$9$10}' > /usr/local/nagios/var/tmp_log`

NAGIOS_LOG_COUNT=`awk -v d1="$(date --date="-60 min" "+%b %_d %H:%M")" -v d2="$(date "+%b %_d %H:%M")" '$0 > d1 && $0 < d2 || $0 ~ d2' /usr/local/nagios/var/tmp_log | wc -l`

SERVICE_NAG_COUNT=`/etc/init.d/nagios status | grep running | wc -l`
####################################### DEC END ##############################################################

if [ $NAGIOS_LOG_COUNT == 0 ];

then

echo "Nagios is running OK"

elif [ $NAGIOS_LOG_COUNT -ge 1 ];

then

echo "Nagios Service Outage" >> /usr/local/nagios/var/nagios_service_check_log

echo "=====================" >> /usr/local/nagios/var/nagios_service_check_log

echo "$NAGIOS_LOG" >> /usr/local/nagios/var/nagios_service_check_log

echo "## Restarting Nagios Service ##" >> /usr/local/nagios/var/nagios_service_check_log

/etc/init.d/nagios restart >> /usr/local/nagios/var/nagios_service_check_log

sleep 2

if [ $SERVICE_NAG_COUNT == 1 ];

then

############# VARIABLE ###############################
SERVICE_NAG=`/etc/init.d/nagios status | grep running`
######################################################

echo "OK - $SERVICE_NAG" >> /usr/local/nagios/var/nagios_service_check_log | mail -s "NOTIFICATION - Nagios Service Outage" toaddress@domain.com < /usr/local/nagios/var/nagios_service_check_log && rm -rf /usr/local/nagios/var/nagios_service_check_log

else

echo "CRITICAL - Nagios Service restart failed" >> /usr/local/nagios/var/nagios_service_check_log | mail -s "CRITICAL - Nagios Service Outage - Escalation Needed" toaddress@domain.com < /usr/local/nagios/var/nagios_service_check_log && rm -rf /usr/local/nagios/var/nagios_service_check_log

fi
fi
Refer:
https://www.howtovmlinux.com/articles/m ... start.html
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Caught SIGSEGV, shutting down

Post by scottwilkerson »

Thanks for sharing!
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
hemak88
Posts: 22
Joined: Wed Mar 29, 2017 2:31 am

Re: Caught SIGSEGV, shutting down

Post by hemak88 »

The issue occurred again yesterday. "check_for_updates=0" is still set as is, which didn't solve the issue. Any solution for this?
The script which I placed saved me this time. It started the Nagios at abrupt shutdown.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Caught SIGSEGV, shutting down

Post by scottwilkerson »

I thought the update check was unlikely to cause this.

If I had to guess it is likely a plugin that is leaking memory but which plugin it is, is going to be hard to track down.

It might be helpful if your script that is catching the restart could capture a ps aux however my guess would be that the offending plugin would already be killed before your script would see it
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked