Caught Sig Term, Shutting Down - Unknown Cause

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mvikhman
Posts: 13
Joined: Mon May 24, 2021 3:49 pm

Re: Caught Sig Term, Shutting Down - Unknown Cause

Post by mvikhman »

Hi Perry,
Have not heard back from you, wanted to see if you had any luck with the logs and finding anything.

We had another similar issue where Nagios backups ran, and I guess they restart nagios durring the backup process.

On satuday morning, it kicked off the backups, but Nagios didn't start. We tried to start in Web UI, kept getting error.
I tried to start manually on command line "systemctl start nagios", this also kept failing.
We where able to get nagios going by doing dummy config push. Basically we edit a config but then make no changes and close. At this point we get message that config needs to be applied. We apply the config and nagios started working again. Here is the log file of when is shut down till startup failed.
You do not have the required permissions to view the files attached to this post.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Caught Sig Term, Shutting Down - Unknown Cause

Post by pbroste »

Hello @mvikhman

Thanks for checking in on this issue, and since it has been a while let's review.
  • Looking through we see that the "pre-flight check" is not failing with errors.
  • [list]
  • Code: Select all

    /usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg
[*]Review on third-party add-ons are not failing[/*]
  • ModGearman
  • Puppet Agent Freshness" on host tossing out duplicate definitions found but don't see issues other than that
[*]Does not appear that we are running out of disk space on mount points[/*]
[*]Memory seems okay looking at the System Profile snapshot.[/*]
[*]Review the latest appears that there is 'servicedependencies.cfg' error, but that is probably a housekeeping issue since the pre-flight is not telling us that this is a show stopper.[/*]
[*]The messages "Unit nagios.service entered failed state and systemd: nagios.service failed" are not very telling on what the cause is. And when we see "Caught Sig Term, Shutting Down - Unknown Cause" this typically means that there is a resource issue.[/*]
  • To verify further we want to see what the 'nagios.service is doing on restart or startup.
  • [list]
  • Code: Select all

    journalctl -xefu nagios.service
[/list]

In your latest update; you stated that a "dummy config push" helped resolve the issue and with that want to have you reindex the Core Configs.

Here are the steps to reindex the Core Configuration Manager (CCM) configs by:
  • 1: command list all running /bin/nagios -> ps -aux | grep -E '/bin/nagios'
  • [list]
  • Code: Select all

    ps -aux | grep -E '/bin/nagios'
[*]2: command -> killall -9 nagios (or pkill nagios)[/*] [*]3: command -> rm -rf /usr/local/nagios/etc/import/*[/*]
  • Code: Select all

    rm -rf /usr/local/nagios/etc/import/*
[*]4: Restart nagios.service by terminal command: -> systemctl restart nagios[/*] [*]5: Head over to the Nagios XI web console ==> Core Configuration Manager (CCM) ==> Config File Management ==> [Delete Files] ==> [Write Files] ==> [Verify Files][/*]
[*]6: Core Configuration Manager (CCM) ==> Under Quick Tools ==> "Apply Configuration"[/*]
[*]7: Restart nagios.service by terminal command: -> systemctl restart nagios[/*] [/list]

Verify that the host and services look good and verify that there are no errors in core by:
  • Code: Select all

    /usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg
[/list]

Let us know how things are looking,
Perry
mvikhman
Posts: 13
Joined: Mon May 24, 2021 3:49 pm

Re: Caught Sig Term, Shutting Down - Unknown Cause

Post by mvikhman »

Hi Perry,
Thank you for the feedback.
Before I execute the procedure, Since this is our production, I am a little nervous running the command :
"Nagios XI web console ==> Core Configuration Manager (CCM) ==> Config File Management ==> [Delete Files] ==> [Write Files] ==> [Verify Files]"

Can you provide some information what this is doing on the back end. And how long , if any, there is a Nagios outage.

Also, there are no files in /usr/local/nagios/etc/import/, so nothing to delete.

I just need to provide my management what potentially can break and how long things can be off line. Also if there is a roll back procedure if this fails.

Thank you.
Michael.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Caught Sig Term, Shutting Down - Unknown Cause

Post by pbroste »

Hello @mvikhman

The [Delete Files] will delete the "Core" configs, and then [Write Files] will re-write the configs from the Nagios Database.

Verify will check for errors.

It is a good idea to take a snapshot or by running a backup (/usr/local/nagiosxi/scripts/backup_xi.sh)

Downtime is minimal, when the nagios service is restarted the checks will stop for a few seconds.

Thanks,
Perry
Locked