Hello there,
we are experiencing a random, intermittent issue with the NDO backend in NagVis.
Software versions:
Nagios Core 4.5.3
Nagios XI 2024R1.2.2
NagVis 1.9.40b
All components (Nagios Core, Nagios XI, NagVis) are running in a PCS cluster with Pacemaker, Corosync and DRBD.
At random intervals every few minutes, NagVis reports the following error on multiple objects:
Problem (Backend: nagios): NDO claims that Nagios did not update for more than 180 seconds.
Visual behavior:
The NagVis map shows Summary State = ERROR (All object turn blue)
The summary output reports “Contains ERROR objects”
Many objects simultaneously switch to ERROR
The output column shows the same NDO 180-second timeout message
After a few minutes, the map returns automatically to OK
No Pacemaker failover occurs and all cluster resources remain Started.
We suspect a temporary interruption in the Nagios → NDO → DB data flow.
Any indication on common causes or recommended checks/tuning for this scenario in HA clustered environments would be appreciated.
Thank you.
Problem (Backend: nagios): NDO claims that Nagios did not update for more than 180 seconds
Re: Problem (Backend: nagios): NDO claims that Nagios did not update for more than 180 seconds
Hello @alexh4e,
There are a couple of potential causes for this issue. To narrow it down, could you check if there are any NDO related messages in /var/local/nagios/var/nagios.log or any database errors in /var/log/mysql/mysqld.log? We're specifically looking for any messages from about the same time as the NagVis timeouts.
Also, is the database running on the same node as XI and NDO or on a separate server?
It also might be worth checking the monitoring engines status under Admin -> System Information -> Monitoring Engine Status. This can give you some insights into whether or not the issue impacts only NagVis or the rest of XI.
Thanks,
Emmett
There are a couple of potential causes for this issue. To narrow it down, could you check if there are any NDO related messages in /var/local/nagios/var/nagios.log or any database errors in /var/log/mysql/mysqld.log? We're specifically looking for any messages from about the same time as the NagVis timeouts.
Also, is the database running on the same node as XI and NDO or on a separate server?
It also might be worth checking the monitoring engines status under Admin -> System Information -> Monitoring Engine Status. This can give you some insights into whether or not the issue impacts only NagVis or the rest of XI.
Thanks,
Emmett
Re: Problem (Backend: nagios): NDO claims that Nagios did not update for more than 180 seconds
Nagios DB is under the same server, as i said all Nagios Application are managed by PCS and DRBD. It's all under /drbd fs for HA
here are other details about the config, it's an example from lab but basically the same
[root@nagsrv1 ~]# pcs status
Cluster name: nag_cluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: nagsrv2 (version 2.1.7-5.3.el8_10-0f7f88312) - partition with quorum
* Last updated: Sat Dec 6 17:07:31 2025 on nagsrv1
* Last change: Fri Dec 5 15:26:13 2025 by root via root on nagsrv1
* 2 nodes configured
* 11 resource instances configured
Node List:
* Online: [ nagsrv1 nagsrv2 ]
Full List of Resources:
* Clone Set: ms_drbd_r0 [drbd_r0] (promotable):
* Masters: [ nagsrv1 ]
* Slaves: [ nagsrv2 ]
* Resource Group: g_nagios:
* p_fs_drbd (ocf:
Filesystem): Started nagsrv1
* p_vipSPV (ocf:
IPaddr2): Started nagsrv1
* p_mysql (ocf:
mysql): Started nagsrv1
* p_snmptrapd (systemd:snmptrapd): Started nagsrv1
* p_snmptt (systemd:snmptt): Started nagsrv1
* p_crond (systemd:crond): Started nagsrv1
* p_nagios (systemd:nagios): Started nagsrv1
* p_npcd (systemd:npcd): Started nagsrv1
* p_httpd (systemd:httpd): Started nagsrv1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@nagsrv1 ~]# cd /drbd/
[root@nagsrv1 drbd]# ll
total 8
drwxr-xr-x 4 root root 35 5 dic 13.25 backups
drwxrwsr-x. 4 root nagios 4096 5 dic 15.06 mibs
drwxrwxr-x 2 apache nagios 21 6 dic 17.10 mrtg
drwxr-xr-x 10 mysql mysql 4096 5 dic 14.32 mysql
drwxr-xr-x 8 root root 79 5 dic 13.24 nagios
drwxr-xr-x 10 root nagios 102 5 dic 13.25 nagiosxi
drwxrwxr-x 5 apache apache 70 5 dic 13.24 nagvis
drwxr-xr-x 2 root nagios 181 5 dic 15.12 snmp
[root@nagsrv1 drbd]# pwd
/drbd
[root@nagsrv1 drbd]#
[root@nagsrv1 drbd]#
[root@nagsrv1 drbd]# drbdadm status
r0 role:Primary
disk:UpToDate open:yes
nagsrv2 role:Secondary
peer-disk:UpToDate
[root@nagsrv1 drbd]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 600M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 38,4G 0 part
├─rhel-root 253:0 0 34,5G 0 lvm /
└─rhel-swap 253:1 0 4G 0 lvm [SWAP]
sdb 8:16 0 20G 0 disk
└─sdb1 8:17 0 20G 0 part
└─drbd-drbddata 253:2 0 19G 0 lvm
└─drbd0 147:0 0 19G 0 disk /drbd
sr0 11:0 1 13,3G 0 rom
here are other details about the config, it's an example from lab but basically the same
[root@nagsrv1 ~]# pcs status
Cluster name: nag_cluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: nagsrv2 (version 2.1.7-5.3.el8_10-0f7f88312) - partition with quorum
* Last updated: Sat Dec 6 17:07:31 2025 on nagsrv1
* Last change: Fri Dec 5 15:26:13 2025 by root via root on nagsrv1
* 2 nodes configured
* 11 resource instances configured
Node List:
* Online: [ nagsrv1 nagsrv2 ]
Full List of Resources:
* Clone Set: ms_drbd_r0 [drbd_r0] (promotable):
* Masters: [ nagsrv1 ]
* Slaves: [ nagsrv2 ]
* Resource Group: g_nagios:
* p_fs_drbd (ocf:
* p_vipSPV (ocf:
* p_mysql (ocf:
* p_snmptrapd (systemd:snmptrapd): Started nagsrv1
* p_snmptt (systemd:snmptt): Started nagsrv1
* p_crond (systemd:crond): Started nagsrv1
* p_nagios (systemd:nagios): Started nagsrv1
* p_npcd (systemd:npcd): Started nagsrv1
* p_httpd (systemd:httpd): Started nagsrv1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@nagsrv1 ~]# cd /drbd/
[root@nagsrv1 drbd]# ll
total 8
drwxr-xr-x 4 root root 35 5 dic 13.25 backups
drwxrwsr-x. 4 root nagios 4096 5 dic 15.06 mibs
drwxrwxr-x 2 apache nagios 21 6 dic 17.10 mrtg
drwxr-xr-x 10 mysql mysql 4096 5 dic 14.32 mysql
drwxr-xr-x 8 root root 79 5 dic 13.24 nagios
drwxr-xr-x 10 root nagios 102 5 dic 13.25 nagiosxi
drwxrwxr-x 5 apache apache 70 5 dic 13.24 nagvis
drwxr-xr-x 2 root nagios 181 5 dic 15.12 snmp
[root@nagsrv1 drbd]# pwd
/drbd
[root@nagsrv1 drbd]#
[root@nagsrv1 drbd]#
[root@nagsrv1 drbd]# drbdadm status
r0 role:Primary
disk:UpToDate open:yes
nagsrv2 role:Secondary
peer-disk:UpToDate
[root@nagsrv1 drbd]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 600M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 38,4G 0 part
├─rhel-root 253:0 0 34,5G 0 lvm /
└─rhel-swap 253:1 0 4G 0 lvm [SWAP]
sdb 8:16 0 20G 0 disk
└─sdb1 8:17 0 20G 0 part
└─drbd-drbddata 253:2 0 19G 0 lvm
└─drbd0 147:0 0 19G 0 disk /drbd
sr0 11:0 1 13,3G 0 rom
-
DoubleDoubleA
- Posts: 273
- Joined: Thu Feb 09, 2017 5:07 pm
Re: Problem (Backend: nagios): NDO claims that Nagios did not update for more than 180 seconds
Hi @alexh4e,
I think you will be best served putting in a support ticket on this one.
In general I would be surprised if there were an interruption of the data flow from Core to mysql. We have found some instances lately where VERY heavy ndo traffic of a very specific type will crash ndo and Core, and we have fixes we hope to release soon on that. But still, that doesn't sound like what you are experiencing, Core is not crashing for you, and somehow things recover on their own.
The Nagios development team answers question on the forum, the support team handles the actual tickets. They are going to have an SLA for you, a private means of sharing your XI profile, and a lot more experience with DRBD.
Aaron
I think you will be best served putting in a support ticket on this one.
In general I would be surprised if there were an interruption of the data flow from Core to mysql. We have found some instances lately where VERY heavy ndo traffic of a very specific type will crash ndo and Core, and we have fixes we hope to release soon on that. But still, that doesn't sound like what you are experiencing, Core is not crashing for you, and somehow things recover on their own.
The Nagios development team answers question on the forum, the support team handles the actual tickets. They are going to have an SLA for you, a private means of sharing your XI profile, and a lot more experience with DRBD.
Aaron