Nagios Support Forum

Posted: **Wed May 26, 2021 11:12 am**

We have 3 instances, dev, stage, and prod; all had XI 5.5; I upgraded dev, and stage to 5.8 and it went fine, worked great.

This morning I upgraded prod from 5.5 to 5.8 and the upgrade script says everything competed successfully, but now the monitoring engine will not start. Nothing in the web interface gives any clue or indication of what the problem could be. I have searched all of the log files I can find, and I cannot find any clue what the problem might be.

Please give me guidance on how to proceed, how to fix this problem?

Rob

Posted: **Wed May 26, 2021 11:25 am**

Uploading the "profile.zip" from the afflicted system; someone always asks for this.

Additionally, FYI, due to a security incident last month, we had to change the default passwords (e.g., in config.inc.php, et al). The upgrade script figured that out, right? I saw nothing to indicate otherwise.

Rob

Posted: **Wed May 26, 2021 11:33 am**

I discovered this (see below) leading me to think that some probe that worked in 5.5 no longer works in 5.8; so the whole monitoring engine can't run now? Or are these unconfigured objects that won't let the monitoring engine run?

[root@campusmon2v2 ~]# systemctl status nagios.service
● nagios.service - Nagios Core 4.4.6
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Wed 2021-05-26 10:52:07 CDT; 38min ago
Docs: https://www.nagios.org/documentation
Process: 9244 ExecStopPost=/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
Process: 8702 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Process: 8677 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Main PID: 8705 (code=killed, signal=ABRT)

May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Health via ACES-NAGIOS' on host 'ace-ts-db'. Unable to find service
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Version via ACES-NAGIOS' on host 'aces-dfs-h1'. Unable to find service
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'via ACES-NAGIOS' on host 'abe-web'. Unable to find service
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Version via ACES-NAGIOS' on host 'ansci-p10r87407'. Unable t...nd service
May 26 10:51:35 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Version via ACES-NAGIOS' on host 'aces-dfs-3t'. Unable to find service
May 26 10:51:43 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'Uptime via ACES-NAGIOS' on host 'abe-web'. Unable to find service
May 26 10:52:07 campusmon2v2.techservices.illinois.edu systemd[1]: nagios.service: main process exited, code=killed, status=6/ABRT
May 26 10:52:07 campusmon2v2.techservices.illinois.edu nagios[8765]: Caught SIGTERM, shutting down...
May 26 10:52:07 campusmon2v2.techservices.illinois.edu systemd[1]: Unit nagios.service entered failed state.
May 26 10:52:07 campusmon2v2.techservices.illinois.edu systemd[1]: nagios.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

Posted: **Wed May 26, 2021 11:42 am**

PS, also discovered this:

systemctl status nagios.service

It ran for about 45-60 seconds before crashing again.

Checking the log file, /usr/local/nagios/var/nagios.log, it just stops -- no evidence of any problem [leading to the crash].

Posted: **Wed May 26, 2021 1:17 pm**

After banging my head all morning, this is where I am stuck:

[root@campusmon2v2 rw]# tail -1 /usr/local/nagios/var/nagios.log
[1622048034] Caught SIGTERM, shutting down...

What in the world is sending a TERM signal to the Nagios daemon...? Every time I start it, something shuts it down within 1 minute later.

Rob

Posted: **Wed May 26, 2021 1:39 pm**

I apologize for commenting on my ticket so many times, I meant to tack this on the end of the comment before last.

I thought the following might be interesting / relevant; the config is okay, sort of.

[root@campusmon2v2 rw]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.4.6
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2020-04-28
License: GPL

Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...

...361 warnings...

Checked 2619 hosts.
Checked 97 host groups.
Checked 58 service groups.
Checked 363 contacts.
Checked 123 contact groups.
Checked 325 commands.
Checked 367 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 2619 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 367 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 361
Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check

Posted: **Wed May 26, 2021 4:10 pm**

Hi,
How are you doing?

Since you have mentioned that you did make changes to the default password, please confirm that they are correct since the upgrade might have over written those files.

Can you restart mysql:

Code: Select all

systemctl restart mariadb

Here's the KB on changing default passwords:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

May be something is stuck under "/usr/local/nagios/etc/import/" folder, let clean up that folder:

Code: Select all

rm -f /usr/local/nagios/etc/import/*

See if you can do "Apply Configuration":
Open your Nagios XI GUI > Configure > Core Config Manager > Config File Management
Click "Delete Files"
Click "Write Configs"
Click "Verify Files"
Now, click "Apply Configuration" at the upper left panel.

Please NOTE: Doing Apply Configuration will also restart Nagios.

Best Regards,
Vinh

Posted: **Wed May 26, 2021 5:10 pm**

Edit your /usr/local/nagios/etc/nagios.cfg and change these:

Code: Select all

debug_level=0
debug_verbosity=1

To these:

Code: Select all

debug_level=-1
debug_verbosity=2

Then restart nagios and wait until it dies:

Code: Select all

systemctl restart nagios

Then PM these files:

Code: Select all

/usr/local/nagios/var/nagios.log
/usr/local/nagios/var/nagios.debug

Disable the debug after you're done.

Posted: **Wed May 26, 2021 5:11 pm**

Message received. I'll start working on this and let you know.

Posted: **Wed May 26, 2021 11:48 pm**

Hi Vinh, Ssax,

I tried Vinh's suggestions, no effect.

I did find a couple of places where the upgrade script reverted non-default passwords to default passwords and changed them back. But this is a huge bug / major security issue with Nagios XI vulnerabilities being actively exploited in the wild (we got hit bad last month).

I followed Ssax's instructions and have attached the requested files.

Still down ("P1"),
Rob

Nagios Support Forum

Monitoring Engine will not start after upgrading to 5.8

Monitoring Engine will not start after upgrading to 5.8

Re: Monitoring Engine will not start after upgrading to 5.8

Re: Monitoring Engine will not start after upgrading to 5.8

Re: Monitoring Engine will not start after upgrading to 5.8

Re: Monitoring Engine will not start after upgrading to 5.8

Re: Monitoring Engine will not start after upgrading to 5.8

Re: Monitoring Engine will not start after upgrading to 5.8

Re: Monitoring Engine will not start after upgrading to 5.8

Re: Monitoring Engine will not start after upgrading to 5.8

Re: Monitoring Engine will not start after upgrading to 5.8