Monitoring Engine will not start after upgrading to 5.8
Monitoring Engine will not start after upgrading to 5.8
We have 3 instances, dev, stage, and prod; all had XI 5.5; I upgraded dev, and stage to 5.8 and it went fine, worked great.
This morning I upgraded prod from 5.5 to 5.8 and the upgrade script says everything competed successfully, but now the monitoring engine will not start. Nothing in the web interface gives any clue or indication of what the problem could be. I have searched all of the log files I can find, and I cannot find any clue what the problem might be.
Please give me guidance on how to proceed, how to fix this problem?
Rob
This morning I upgraded prod from 5.5 to 5.8 and the upgrade script says everything competed successfully, but now the monitoring engine will not start. Nothing in the web interface gives any clue or indication of what the problem could be. I have searched all of the log files I can find, and I cannot find any clue what the problem might be.
Please give me guidance on how to proceed, how to fix this problem?
Rob
Re: Monitoring Engine will not start after upgrading to 5.8
Uploading the "profile.zip" from the afflicted system; someone always asks for this.
Additionally, FYI, due to a security incident last month, we had to change the default passwords (e.g., in config.inc.php, et al). The upgrade script figured that out, right? I saw nothing to indicate otherwise.
Rob
Additionally, FYI, due to a security incident last month, we had to change the default passwords (e.g., in config.inc.php, et al). The upgrade script figured that out, right? I saw nothing to indicate otherwise.
Rob
You do not have the required permissions to view the files attached to this post.
Re: Monitoring Engine will not start after upgrading to 5.8
I discovered this (see below) leading me to think that some probe that worked in 5.5 no longer works in 5.8; so the whole monitoring engine can't run now? Or are these unconfigured objects that won't let the monitoring engine run?
[root@campusmon2v2 ~]# systemctl status nagios.service
● nagios.service - Nagios Core 4.4.6
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Wed 2021-05-26 10:52:07 CDT; 38min ago
Docs: https://www.nagios.org/documentation
Process: 9244 ExecStopPost=/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
Process: 8702 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Process: 8677 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Main PID: 8705 (code=killed, signal=ABRT)
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Health via ACES-NAGIOS' on host 'ace-ts-db'. Unable to find service
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Version via ACES-NAGIOS' on host 'aces-dfs-h1'. Unable to find service
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'via ACES-NAGIOS' on host 'abe-web'. Unable to find service
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Version via ACES-NAGIOS' on host 'ansci-p10r87407'. Unable t...nd service
May 26 10:51:35 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Version via ACES-NAGIOS' on host 'aces-dfs-3t'. Unable to find service
May 26 10:51:43 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'Uptime via ACES-NAGIOS' on host 'abe-web'. Unable to find service
May 26 10:52:07 campusmon2v2.techservices.illinois.edu systemd[1]: nagios.service: main process exited, code=killed, status=6/ABRT
May 26 10:52:07 campusmon2v2.techservices.illinois.edu nagios[8765]: Caught SIGTERM, shutting down...
May 26 10:52:07 campusmon2v2.techservices.illinois.edu systemd[1]: Unit nagios.service entered failed state.
May 26 10:52:07 campusmon2v2.techservices.illinois.edu systemd[1]: nagios.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@campusmon2v2 ~]# systemctl status nagios.service
● nagios.service - Nagios Core 4.4.6
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Wed 2021-05-26 10:52:07 CDT; 38min ago
Docs: https://www.nagios.org/documentation
Process: 9244 ExecStopPost=/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
Process: 8702 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Process: 8677 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Main PID: 8705 (code=killed, signal=ABRT)
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Health via ACES-NAGIOS' on host 'ace-ts-db'. Unable to find service
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Version via ACES-NAGIOS' on host 'aces-dfs-h1'. Unable to find service
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'via ACES-NAGIOS' on host 'abe-web'. Unable to find service
May 26 10:51:26 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Version via ACES-NAGIOS' on host 'ansci-p10r87407'. Unable t...nd service
May 26 10:51:35 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'NSClient Version via ACES-NAGIOS' on host 'aces-dfs-3t'. Unable to find service
May 26 10:51:43 campusmon2v2.techservices.illinois.edu nagios[8705]: Error: Got check result for service 'Uptime via ACES-NAGIOS' on host 'abe-web'. Unable to find service
May 26 10:52:07 campusmon2v2.techservices.illinois.edu systemd[1]: nagios.service: main process exited, code=killed, status=6/ABRT
May 26 10:52:07 campusmon2v2.techservices.illinois.edu nagios[8765]: Caught SIGTERM, shutting down...
May 26 10:52:07 campusmon2v2.techservices.illinois.edu systemd[1]: Unit nagios.service entered failed state.
May 26 10:52:07 campusmon2v2.techservices.illinois.edu systemd[1]: nagios.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
Re: Monitoring Engine will not start after upgrading to 5.8
PS, also discovered this:
systemctl status nagios.service
It ran for about 45-60 seconds before crashing again.
Checking the log file, /usr/local/nagios/var/nagios.log, it just stops -- no evidence of any problem [leading to the crash].
systemctl status nagios.service
It ran for about 45-60 seconds before crashing again.
Checking the log file, /usr/local/nagios/var/nagios.log, it just stops -- no evidence of any problem [leading to the crash].
Re: Monitoring Engine will not start after upgrading to 5.8
After banging my head all morning, this is where I am stuck:
[root@campusmon2v2 rw]# tail -1 /usr/local/nagios/var/nagios.log
[1622048034] Caught SIGTERM, shutting down...
What in the world is sending a TERM signal to the Nagios daemon...? Every time I start it, something shuts it down within 1 minute later.
Rob
[root@campusmon2v2 rw]# tail -1 /usr/local/nagios/var/nagios.log
[1622048034] Caught SIGTERM, shutting down...
What in the world is sending a TERM signal to the Nagios daemon...? Every time I start it, something shuts it down within 1 minute later.
Rob
Re: Monitoring Engine will not start after upgrading to 5.8
I apologize for commenting on my ticket so many times, I meant to tack this on the end of the comment before last.
I thought the following might be interesting / relevant; the config is okay, sort of.
[root@campusmon2v2 rw]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.4.6
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2020-04-28
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
...361 warnings...
Checked 2619 hosts.
Checked 97 host groups.
Checked 58 service groups.
Checked 363 contacts.
Checked 123 contact groups.
Checked 325 commands.
Checked 367 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 2619 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 367 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 361
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
I thought the following might be interesting / relevant; the config is okay, sort of.
[root@campusmon2v2 rw]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.4.6
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2020-04-28
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
...361 warnings...
Checked 2619 hosts.
Checked 97 host groups.
Checked 58 service groups.
Checked 363 contacts.
Checked 123 contact groups.
Checked 325 commands.
Checked 367 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 2619 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 367 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 361
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
Re: Monitoring Engine will not start after upgrading to 5.8
Hi,
How are you doing?
Since you have mentioned that you did make changes to the default password, please confirm that they are correct since the upgrade might have over written those files.
Can you restart mysql:
Here's the KB on changing default passwords:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
May be something is stuck under "/usr/local/nagios/etc/import/" folder, let clean up that folder:
See if you can do "Apply Configuration":
Open your Nagios XI GUI > Configure > Core Config Manager > Config File Management
Click "Delete Files"
Click "Write Configs"
Click "Verify Files"
Now, click "Apply Configuration" at the upper left panel.
Please NOTE: Doing Apply Configuration will also restart Nagios.
Best Regards,
Vinh
How are you doing?
Since you have mentioned that you did make changes to the default password, please confirm that they are correct since the upgrade might have over written those files.
Can you restart mysql:
Code: Select all
systemctl restart mariadb
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
May be something is stuck under "/usr/local/nagios/etc/import/" folder, let clean up that folder:
Code: Select all
rm -f /usr/local/nagios/etc/import/*
Open your Nagios XI GUI > Configure > Core Config Manager > Config File Management
Click "Delete Files"
Click "Write Configs"
Click "Verify Files"
Now, click "Apply Configuration" at the upper left panel.
Please NOTE: Doing Apply Configuration will also restart Nagios.
Best Regards,
Vinh
Re: Monitoring Engine will not start after upgrading to 5.8
Edit your /usr/local/nagios/etc/nagios.cfg and change these:
To these:
Then restart nagios and wait until it dies:
Then PM these files:
Disable the debug after you're done.
Code: Select all
debug_level=0
debug_verbosity=1Code: Select all
debug_level=-1
debug_verbosity=2Code: Select all
systemctl restart nagiosCode: Select all
/usr/local/nagios/var/nagios.log
/usr/local/nagios/var/nagios.debugRe: Monitoring Engine will not start after upgrading to 5.8
Message received. I'll start working on this and let you know.
Re: Monitoring Engine will not start after upgrading to 5.8
Hi Vinh, Ssax,
I tried Vinh's suggestions, no effect.
I did find a couple of places where the upgrade script reverted non-default passwords to default passwords and changed them back. But this is a huge bug / major security issue with Nagios XI vulnerabilities being actively exploited in the wild (we got hit bad last month).
I followed Ssax's instructions and have attached the requested files.
Still down ("P1"),
Rob
I tried Vinh's suggestions, no effect.
I did find a couple of places where the upgrade script reverted non-default passwords to default passwords and changed them back. But this is a huge bug / major security issue with Nagios XI vulnerabilities being actively exploited in the wild (we got hit bad last month).
I followed Ssax's instructions and have attached the requested files.
Still down ("P1"),
Rob
You do not have the required permissions to view the files attached to this post.