Caught Sig Term, Shutting Down - Unknown Cause
Caught Sig Term, Shutting Down - Unknown Cause
Caught Sig Term, Shutting Down - Unknown Cause
Linux Distribution and version? CentOS 7, 64 bit, manual install
Are there special configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL?
I found the Monitoring Engine Event Queue dashlet within the Monitoring Process\Process info page was empty and managed to catch the following in the event log history.
2021-08-23 20:05:16 - Event broker module '/usr/lib64/mod_german/mod_german_nagios4.o' initialized successfully
2021-08-23 20:05:16 - mod_gearman: initialized version 3.0.7 (libgearman 0.33)
2021-08-23 20:05:16 - Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfylly.
2021-08-23 20:05:10 - Successfully shutdown... (PID...)
2021-08-23 20:05:10 - Caught SIGTERM, shutting down
This was followed by several minutes of initial state declarations and then checks following that. We have had several instances of this, and most of the time we are able to use the Monitoring Engine Process dashlet from an end user (non CCM user or admin) to restart the process. There are times that it requires a push from CCM to restart the Nagios process. I will have my team look for more information from the logs but any suggestions are helpful. We have other issues that may or may not be related so I will start them in a different thread. This is the most common issue and is a hard stop on our monitoring service until it is noticed, manually, and restarted.
----
Nagios XI - System Info
Nagios XI version: 5.8.3
Release info: 3.10.0-1062.9.1.el7.x86_64 x86_64
CentOS Linux release 7.0.1406 (Core)
Gnome is not installed
Apache Information
PHP Version: 5.4.16
Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
Server Port: 443
Date/Time
PHP Timezone: UTC
PHP Time: Mon, 23 Aug 2021 21:23:08 +0000
System Time: Mon, 23 Aug 2021 21:23:08 +0000
Nagios XI Data
Install Type: manual/unknown
??6592 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
??9832 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
CPU Load 15: 0.66
Total Hosts: 892
Total Services: 5881
Nagios XI Components, abbreviated
ccm 3.1.1
deploydashboard 1.3.2
deploynotification 1.3.4
escalationwizard 1.5.1
freevariabletab 1.1.0
globaleventhandler 1.3.0
graphexplorer 2.3.0
helpsystem 2.0.1
ldap_ad_integration 1.2.2
massacknowledge 2.2.2
massimmediatecheck 1.0.2
metrics 1.3.5
nagiosbpi 3.0.3
nagiosim 2.2.6
nagiosna 1.4.4
Linux Distribution and version? CentOS 7, 64 bit, manual install
Are there special configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL?
I found the Monitoring Engine Event Queue dashlet within the Monitoring Process\Process info page was empty and managed to catch the following in the event log history.
2021-08-23 20:05:16 - Event broker module '/usr/lib64/mod_german/mod_german_nagios4.o' initialized successfully
2021-08-23 20:05:16 - mod_gearman: initialized version 3.0.7 (libgearman 0.33)
2021-08-23 20:05:16 - Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfylly.
2021-08-23 20:05:10 - Successfully shutdown... (PID...)
2021-08-23 20:05:10 - Caught SIGTERM, shutting down
This was followed by several minutes of initial state declarations and then checks following that. We have had several instances of this, and most of the time we are able to use the Monitoring Engine Process dashlet from an end user (non CCM user or admin) to restart the process. There are times that it requires a push from CCM to restart the Nagios process. I will have my team look for more information from the logs but any suggestions are helpful. We have other issues that may or may not be related so I will start them in a different thread. This is the most common issue and is a hard stop on our monitoring service until it is noticed, manually, and restarted.
----
Nagios XI - System Info
Nagios XI version: 5.8.3
Release info: 3.10.0-1062.9.1.el7.x86_64 x86_64
CentOS Linux release 7.0.1406 (Core)
Gnome is not installed
Apache Information
PHP Version: 5.4.16
Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
Server Port: 443
Date/Time
PHP Timezone: UTC
PHP Time: Mon, 23 Aug 2021 21:23:08 +0000
System Time: Mon, 23 Aug 2021 21:23:08 +0000
Nagios XI Data
Install Type: manual/unknown
??6592 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
??9832 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
CPU Load 15: 0.66
Total Hosts: 892
Total Services: 5881
Nagios XI Components, abbreviated
ccm 3.1.1
deploydashboard 1.3.2
deploynotification 1.3.4
escalationwizard 1.5.1
freevariabletab 1.1.0
globaleventhandler 1.3.0
graphexplorer 2.3.0
helpsystem 2.0.1
ldap_ad_integration 1.2.2
massacknowledge 2.2.2
massimmediatecheck 1.0.2
metrics 1.3.5
nagiosbpi 3.0.3
nagiosim 2.2.6
nagiosna 1.4.4
Re: Caught Sig Term, Shutting Down - Unknown Cause
Hello @jfox
Thanks for reaching out and providing the info, but it looks like we will need to get the System Profile from you as well so we can see what is going on.
Let's start off with a check on the nagios.service:
Make note of the PID number, we will end this process by killing it.
Let's find out what brokers?
To send us your system profile.
Please send the results via Private Message.
Thanks,
Perry
Thanks for reaching out and providing the info, but it looks like we will need to get the System Profile from you as well so we can see what is going on.
Let's start off with a check on the nagios.service:
Code: Select all
journalctl -u nagios.service -f > results.txt &Code: Select all
systemctl restart nagiosCode: Select all
grep broker /usr/local/nagios/etc/nagios.cfg- Login to the Nagios XI GUI using a web browser.
- Click the "Admin" > "System Profile" Menu
- Click the "Download Profile" button
- Save the profile.zip file and share in a private message or upload it to the post/ticket
Code: Select all
kill ##pid_number_from_the_journalctl_command###Thanks,
Perry
Re: Caught Sig Term, Shutting Down - Unknown Cause
Perry, thank you for the update. I will reach out to my engineers to get you the information requested. I sent you a PM with a trimmed system profile.
Re: Caught Sig Term, Shutting Down - Unknown Cause
Hello @jfox
Thanks for following up and providing the System Profile. After reviewing the events and logs we don't see anything that sticks out.
Typically when we see a Caught SIGTERM, shutting down; this is an error(signal) caused by an invalid memory reference or a segmentation fault.
The most common cause of this would be if the server ran out of memory. From the system information provided we see that the environment is running Nagios-based applications with the Gearman module. Want to have you take a look at the /var/log/gearmand/gearmand.log to verify that we don't see anything related. Otherwise from the snapshot that we received the RAM usage looks good unless you see anything different ('watch -n 5 free') on your end.
Want to also provide the following support articles that reference performance optimizations:
Perry
Thanks for following up and providing the System Profile. After reviewing the events and logs we don't see anything that sticks out.
Typically when we see a Caught SIGTERM, shutting down; this is an error(signal) caused by an invalid memory reference or a segmentation fault.
The most common cause of this would be if the server ran out of memory. From the system information provided we see that the environment is running Nagios-based applications with the Gearman module. Want to have you take a look at the /var/log/gearmand/gearmand.log to verify that we don't see anything related. Otherwise from the snapshot that we received the RAM usage looks good unless you see anything different ('watch -n 5 free') on your end.
Want to also provide the following support articles that reference performance optimizations:
- https://assets.nagios.com/downloads/nagiosxi/docs/Maximizing-Performance-In-Nagios-XI.pdf
- https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/tuning.html
Perry
Re: Caught Sig Term, Shutting Down - Unknown Cause
Hi Perry,
I want to follow up on this issue. Basically one of the the things we see periodically is when we make a chane in nagios xi and apply the config. At that point our Monitoring Engine Event Queue show zero in the queue. So we need to restart the engine or nagios to make nagios start checking again.
My question is, the next time we see the Monitoring Engine Event Queue stop showing checks, what can you recommend we check (logs or memory etc) to see why the Monitoring engine stopped running. At this point we are have sporadic issue, but don't know what is causing it.
Thank you for your help.
Mike
I want to follow up on this issue. Basically one of the the things we see periodically is when we make a chane in nagios xi and apply the config. At that point our Monitoring Engine Event Queue show zero in the queue. So we need to restart the engine or nagios to make nagios start checking again.
My question is, the next time we see the Monitoring Engine Event Queue stop showing checks, what can you recommend we check (logs or memory etc) to see why the Monitoring engine stopped running. At this point we are have sporadic issue, but don't know what is causing it.
Thank you for your help.
Mike
Re: Caught Sig Term, Shutting Down - Unknown Cause
Hello @mvikhman
Thanks for following up; several things to look at. One; '/var/log/messages' or '/var/log/syslog' depending on distro. Secondly, the '/usr/local/nagios/var/nagios.log', and the ipcs (interested in queues and overall pic).
All of these will provide help provide understanding.
Thanks,
Perry
Thanks for following up; several things to look at. One; '/var/log/messages' or '/var/log/syslog' depending on distro. Secondly, the '/usr/local/nagios/var/nagios.log', and the ipcs (interested in queues and overall pic).
All of these will provide help provide understanding.
Thanks,
Perry
Re: Caught Sig Term, Shutting Down - Unknown Cause
Hi Perry,
We had this issue again where we try to apply the config, and then nagios seems to stop running checks until we apply a config again with out making any changes.
I have a attached /var/log/messages , /usr/local/nagios/var/nagios.log and a screen shot of the config snap shots.
Basically, a config change was made and applied 9/21/2021 at 15:35:25 UTC.
However, we didn't notice that nagios was not updating check untill 9/21/2021 16:50:26 UTC at which time we applied the config again with out actually making any config changes which seems to restart nagios and the checks start to process.
We notice that starting at 15:35:25 UTC (when the config was applied first) , there was no data in "Scheduled Events Over Time" dashlet. Only when we apply the config second time at 16:50 , we see data in "Scheduled Events Over Time".
Please review the logs and let me know if you can find anything.
Mike
We had this issue again where we try to apply the config, and then nagios seems to stop running checks until we apply a config again with out making any changes.
I have a attached /var/log/messages , /usr/local/nagios/var/nagios.log and a screen shot of the config snap shots.
Basically, a config change was made and applied 9/21/2021 at 15:35:25 UTC.
However, we didn't notice that nagios was not updating check untill 9/21/2021 16:50:26 UTC at which time we applied the config again with out actually making any config changes which seems to restart nagios and the checks start to process.
We notice that starting at 15:35:25 UTC (when the config was applied first) , there was no data in "Scheduled Events Over Time" dashlet. Only when we apply the config second time at 16:50 , we see data in "Scheduled Events Over Time".
Please review the logs and let me know if you can find anything.
Mike
You do not have the required permissions to view the files attached to this post.
Re: Caught Sig Term, Shutting Down - Unknown Cause
Hello @mvikhman
Could you resend the System Profile from the previous post in the thread? Please go ahead and send via Private Message [PM].
Thanks,
Perry
Could you resend the System Profile from the previous post in the thread? Please go ahead and send via Private Message [PM].
Thanks,
Perry
Re: Caught Sig Term, Shutting Down - Unknown Cause
Hi Perry,
I uploaded profile.zip in a private message, please confirm you have reprieved it.
Mike
I uploaded profile.zip in a private message, please confirm you have reprieved it.
Mike
Re: Caught Sig Term, Shutting Down - Unknown Cause
Hello @mvikhman
Thanks, I have received it and will follow up after review.
Perry
Thanks, I have received it and will follow up after review.
Perry