Database Backend Status Red

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
silverbenz
Posts: 30
Joined: Wed Nov 14, 2012 9:06 pm

Database Backend Status Red

Post by silverbenz »

Hi,

XI version 5.6.14
Centos 7.7 (64-bit)
VMware image
No special config

Discovered today that our instance of XI is showing status Red for the system component "Database Backend". I was due to do an upgrade to latest today anyway so I took a VM snapshot and proceeded with the upgrade. Upgrade appeared to succeed but after I was unable to login to, or really even view, the GUI anymore. There was an SQL error as well as a notice advising I needed to reactivate the license. I tried a number of things I'd found in forum posts but nothing worked, so I reverted to the snapshot. Up to this point I had tried the upgrade twice, once via the GUI and once via the command line (based on this document: https://assets.nagios.com/downloads/nag ... ctions.pdf). Same result both times. I also did a component specific upgrade of ndo after the second full upgrade attempt.

Current status is that I'm back on 5.6.14 with everything looking OK - I can login and view all XI elements again - but the Database Backend component is remaining with status red. I figure, based on my earlier experience, that I need to solve this before I attempt to upgrade again.

Once again, I've tried a number of things I found in forum posts that seemed to be related, but no change.

This is a DR instance, so the only thing it's monitoring is the Production instance (XI 5.7.3), so not super-urgent. The Production instance is showing a check result for the DR instance service "Nagios XI Daemons" as "ndo2db (Database Backend) stopped". I've searched a number of forum posts related to ndo2db but have not yet found a solution that works.

I can't remember if 5.6.14 is supposed to still have a running ndo2db service, but when I try to start it I receive the error "Failed to start ndo2db.service: Unit not found". Also, nagios log contains a number of these messages:

[Fri Nov 20 12:17:33 2020] ndomod: Could not open data sink! I'll keep trying, but some output may get lost...
[Fri Nov 20 12:17:33 2020] ndomod registered for process data
[Fri Nov 20 12:17:33 2020] ndomod registered for log data'
[Fri Nov 20 12:17:33 2020] ndomod registered for system command data'
[Fri Nov 20 12:17:33 2020] ndomod registered for event handler data'
[Fri Nov 20 12:17:33 2020] ndomod registered for notification data'
[Fri Nov 20 12:17:33 2020] ndomod registered for comment data'
[Fri Nov 20 12:17:33 2020] ndomod registered for downtime data'
[Fri Nov 20 12:17:33 2020] ndomod registered for flapping data'
[Fri Nov 20 12:17:33 2020] ndomod registered for program status data'
[Fri Nov 20 12:17:33 2020] ndomod registered for host status data'
[Fri Nov 20 12:17:33 2020] ndomod registered for service status data'
[Fri Nov 20 12:17:33 2020] ndomod registered for adaptive program data'
[Fri Nov 20 12:17:33 2020] ndomod registered for adaptive host data'
[Fri Nov 20 12:17:33 2020] ndomod registered for adaptive service data'
[Fri Nov 20 12:17:33 2020] ndomod registered for external command data'
[Fri Nov 20 12:17:33 2020] ndomod registered for aggregated status data'
[Fri Nov 20 12:17:33 2020] ndomod registered for retention data'
[Fri Nov 20 12:17:33 2020] ndomod registered for contact data'
[Fri Nov 20 12:17:33 2020] ndomod registered for contact notification data'
[Fri Nov 20 12:17:33 2020] ndomod registered for acknowledgement data'
[Fri Nov 20 12:17:33 2020] ndomod registered for state change data'
[Fri Nov 20 12:17:33 2020] ndomod registered for contact status data'
[Fri Nov 20 12:17:33 2020] ndomod registered for adaptive contact data'
[Fri Nov 20 12:17:33 2020] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[Fri Nov 20 12:17:37 2020] Warning: Return code of 4 for check of service 'Service Status - ndo2db' on host 'localhost' was out of bounds.
[Fri Nov 20 12:22:37 2020] Warning: Return code of 4 for check of service 'Service Status - ndo2db' on host 'localhost' was out of bounds.
[Fri Nov 20 12:27:37 2020] Warning: Return code of 4 for check of service 'Service Status - ndo2db' on host 'localhost' was out of bounds.
[Fri Nov 20 12:32:37 2020] Warning: Return code of 4 for check of service 'Service Status - ndo2db' on host 'localhost' was out of bounds.
[Fri Nov 20 12:32:48 2020] ndomod: Still unable to connect to data sink. 853 items lost, 5000 queued items to flush.
[Fri Nov 20 12:37:37 2020] Warning: Return code of 4 for check of service 'Service Status - ndo2db' on host 'localhost' was out of bounds.
[Fri Nov 20 12:42:37 2020] Warning: Return code of 4 for check of service 'Service Status - ndo2db' on host 'localhost' was out of bounds.
[Fri Nov 20 12:47:37 2020] Warning: Return code of 4 for check of service 'Service Status - ndo2db' on host 'localhost' was out of bounds.
[Fri Nov 20 12:47:49 2020] ndomod: Still unable to connect to data sink. 2092 items lost, 5000 queued items to flush.

Any help appreciated and thanks in advance.

Ben.

[UPDATE] Assuming it's related, but I've just noticed the XI GUI doesn't appear to be updating with regard to Hosts/Services. Had to go into the Core GUI to acknowledge the ndo2db service problem. It is showing as OK in the XI GUI and the Last Check timestamp is very old for all Hosts/Services.
Last edited by silverbenz on Sun Nov 22, 2020 11:26 pm, edited 1 time in total.
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Database Backend Status Red

Post by dchurch »

ndo2db is our older technology that basically listens on a socket for database inserts, then handles the actual insertion into the database. It has limits, being that it runs into issues when it tries to insert more than the database can handle. In newer versions (Nagios XI 5.7.0 and later), this was replaced by just writing directly to the database from the Nagios worker threads. This resulted in an overall performance boost, too.

This problem could be solved by upgrading, but in lieu of that, you could try increasing the socket memory limit. It wouldn't solve the database being pegged all the time, but it might allow for a bigger queue:

Code: Select all

# Inspect the values
sysctl -a |grep 'net.core.[rw]mem'
# Set the values to something bigger
sysctl -w net.core.rmem_default=512000
sysctl -w net.core.wmem_default=512000
sysctl -w net.core.rmem_max=512000
sysctl -w net.core.wmem_max=512000
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
silverbenz
Posts: 30
Joined: Wed Nov 14, 2012 9:06 pm

Re: Database Backend Status Red

Post by silverbenz »

Thanks @dchurch.

I'm curious why this instance would be busy enough for ndo2db to be having insertion issues, given that it's only monitoring itself and the production instance? It is, however, receiving SNMP traps as well - even though the SNMP Trap Interface hasn't been configured. Perhaps this is the reason?

Regardless, I'd definitely rather upgrade than try to fix something that's no longer required in newer versions. As I pointed out above though, when I tried to upgrade yesterday (to 5.7.latest) the majority of the upgrade appeared to work, but when accessing the GUI I received SQL error messages and a warning that my license required re-activation. The license should be valid through to sometime in 2022 as it was recently renewed. I wasn't able to figure out how to get past that, using suggestions I found in other forum posts, so I rolled back to 5.6.14.

Happy to try the upgrade again on Monday (AEDT GMT+11) and will update afterwards. If you've got any advice pre-upgrade (aside from doing a backup - got that one covered) or with respect to the license issue then I'd be happy to hear it in advance.

Thanks.

[Edit] The license issue was also made harder to work on due to the SQL error causing the GUI to not draw properly. When I clicked on the activation link the resulting page loaded into the frame typically used for the top navigation menu.
silverbenz
Posts: 30
Joined: Wed Nov 14, 2012 9:06 pm

Re: Database Backend Status Red

Post by silverbenz »

Have attempted the upgrade to 5.7.5 again and sadly ended up with the same result as the first time I tried.

Current status:
- In the XI GUI, the text window that displays the upgrade progress has stalled after upgrading component bbmap. I see "Done!" at the bottom of the page, but it has been sat there for minutes.
- The "Update in progress. Please wait. Update may take a few minutes." box is still there with the progress spinner still spinning.
- In place of the small status icon that sites between the magnifying glass and the username in the top nav frame is the following error:
"SQL Error [nagiosxi] : You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'WHERE user_id = 1' at line 3"
- Reviewing the logfile at /usr/local/nagiosxi/tmp/upgrade.log shows the upgrade progressed past the point it appears to have stalled in the GUI (just after updating bbmap).
- Nothing in the upgrade.log appears amiss until right at the end where the following _might_ be suggesting where the SQL error is coming from(?):

UPGRADE: POST-UPGRADE: NDO post upgrade started...
2.0.1
Removing depricated failure_prediction_enabled from NagiosQL
copying updated mysql-upgrade-2.0.0.sql
Current database version: 2.0.1
** DB upgrade required for 2.1.0
Using mysql-upgrade-2.1.0.sql for upgrade...
** Upgrade to 2.1.0 complete
** DB upgrade required for 2.1.2
Using mysql-upgrade-2.1.2.sql for upgrade...
** Upgrade to 2.1.2 complete
Failed to stop ndo2db.service: Unit ndo2db.service not loaded.

Perhaps the upgrade from 5.6.14 to 5.7.5 was a bridge too far? I'm going to try an upgrade to an earlier version (5.7.0) and then step up from there to see if that helps. Apologies for the stream of consciousness post, but hoping maybe it could help someone else in a similar fix. And also happy for others to offer suggestions if they've run into the same issue.
silverbenz
Posts: 30
Joined: Wed Nov 14, 2012 9:06 pm

Re: Database Backend Status Red

Post by silverbenz »

Trying to upgrade to 5.7.0 (and 5.7.1) from 5.6.14 has also failed. Each upgrade attempt ends with the same message at the end of upgrade.log:
Failed to stop ndo2db.service: Unit ndo2db.service not loaded.

Running systemctl status ndo2db results in an error message saying the "Unit ndo2db.service could not be found." This is also common after every upgrade attempt. I have also just confirmed that the same error occurs when I roll back my VM snapshot to the 5.6.14 version and run systemctl status ndo2db.

Each time I have attempted to upgrade, the upgrade stops at the same point and when I log back in to the GUI, the version reports 5.6.14 and the SQL error I mentioned above is present, sometimes multiple times on each page.

Looks like I'm back to square one: need to figure out what's wrong with NDO (I think) before I can make any progress as it seems to even be blocking any attempt to upgrade.
silverbenz
Posts: 30
Joined: Wed Nov 14, 2012 9:06 pm

Re: Database Backend Status Red

Post by silverbenz »

Progress!

I decided to download the install package for 5.6.14, which is what my instance is now. Found an NDO install script under the un-tar'd directory at /tmp/nagiosxi/subcomponents/ndoutils and ran it. That appears to have fixed the NDO issue where the system could not find a unit file to start the process running.

[user@servername ndoutils]# systemctl status ndo2db
ndo2db.service - Nagios Data Out Daemon
Loaded: loaded (/usr/lib/systemd/system/ndo2db.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2020-11-23 11:12:08 AEDT; 3s ago

Also solved the "license requires re-activation" issue by re-adding the license details.

Hopefully now I'll be able to upgrade. Will update this case later today.
silverbenz
Posts: 30
Joined: Wed Nov 14, 2012 9:06 pm

Re: Database Backend Status Red

Post by silverbenz »

Final update:

- Database Backend Status Red was being caused by the ndo2db service being completely missing in action. I cannot account for this, but it's been a few months since I looked at this DR copy of XI so my bad.
- ndo2db issue resolved (see above post)
- Upgrade to 5.7.5 has now been successful.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Database Backend Status Red

Post by benjaminsmith »

Hi,
ndo2db issue resolved (see above post)
- Upgrade to 5.7.5 has now been successful.
That's good to hear. We'll go ahead and lock this post.

Best Regards,
Benjamin
Nagios Support Team
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked