Page 1 of 1

Monitoring Engine won't start after upgrade

Posted: Mon Jun 17, 2019 2:34 pm
by sathevaner
I just upgraded to the latest version of Nagios, manually on the command line, and the monitoring engine refuses to start. The web UI upgrade procedure was left in a hung "Update in progress. Please wait. Update may take a few minutes," Upon reporting that it is running 5.6.3, it is still stuck in that state.

Please advise on next steps for troubleshooting.

Re: Monitoring Engine won't start after upgrade

Posted: Mon Jun 17, 2019 2:43 pm
by benjaminsmith
Hi @sathevaner,

There's a known issue with the web upgrade stalling on occasion. Please follow the guide to reset the upgrade status.

Nagios XI - Reset Upgrade Status In Web Interface

Then restart Nagios:

Code: Select all

systemctl restart nagios
Let me know if the Monitoring Engine status is green again.

Re: Monitoring Engine won't start after upgrade

Posted: Mon Jun 17, 2019 2:51 pm
by sathevaner
Hello, @benjaminsmith,

Thank you for the manual reset instructions for the database backend. This did not resolve the Monitoring Engine issue, as it is still refusing to start. I am unclear on why it will not do so after the manual upgrade.

Re: Monitoring Engine won't start after upgrade

Posted: Mon Jun 17, 2019 3:41 pm
by benjaminsmith
Hello,

Is the server showing Nagios 5.6.3 in the lower left side of the screen? Let's try stopping everything and then re-start the whole Nagios stack. Please run the following from the terminal:

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mysqld || systemctl restart mariadb
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd
If it does not come back up, can you PM your system profile so we can take a closer look at the logs. Thanks.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket.

Re: Monitoring Engine won't start after upgrade

Posted: Mon Jun 17, 2019 4:31 pm
by sathevaner
Restarting all the relevant services did not result in the Monitoring Engine starting. Attached is the System Profile.

Re: Monitoring Engine won't start after upgrade

Posted: Mon Jun 17, 2019 4:46 pm
by ssax
This is why, it's because your configuration is invalid:

Code: Select all

Error: Invalid max_attempts, check_interval, retry_interval, or notification_interval value for service 'Check status of APC battery' on host 'APC-UPS-ITEC-043'
Error: Could not register service (config file '/usr/local/nagios/etc/services/APC-UPS.cfg', starting on line 16)
You can validate with:

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Please go to Configure > Core Config Manager > Services:
- Edit that service and make sure all these are set:

Code: Select all

Invalid max_attempts, check_interval, retry_interval, notification_interval
- Save

Then go to Configure > Core Config Manager > Tools > Config File Management:
- Click the Delete Files button (don't worry, it's safe, they will be rewritten)
- Then click the Write Configs button
- Then click the Verify Files button, if it verifies properly, try to start the nagios service from the CLI.
- If it doesn't verify, keep fixing the errors it shows until it does.

Re: Monitoring Engine won't start after upgrade

Posted: Mon Jun 17, 2019 4:48 pm
by ssax
Additionally (see previous post): When an apply configuration fails, it reverts the files on the filesystem to the last known-good state so that your monitoring continues to work, so... to get them into a bad state so I can review exactly what the issues is, you need to do the steps below exactly without any steps in between:

Please go to Configure > Core Config Manager > Tools > Config File Management:
- Click the Delete Files button (don't worry, it's safe, they will be rewritten)
- Then click the Write Configs button
- When they are done being written, run this command before doing anything else (don't apply config or anything):

Code: Select all

zip -r /tmp/NAGIOSBADFILES.zip /usr/local/nagios/etc
Then attach the resulting /tmp/NAGIOSBADFILES.zip file to the ticket so that I can review it.

Re: Monitoring Engine won't start after upgrade

Posted: Tue Jun 18, 2019 1:22 pm
by sathevaner
Thank you, I was unaware that this was a legit host in our list. I have corrected the issue. This thread may now be closed.

Re: Monitoring Engine won't start after upgrade

Posted: Tue Jun 18, 2019 1:39 pm
by benjaminsmith
Hi,
Thank you, I was unaware that this was a legit host in our list. I have corrected the issue. This thread may now be closed.
Thanks for the update and glad you got this worked out.