ndo2db: Error: queue recv error.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
sarfarosh
Posts: 211
Joined: Fri Oct 05, 2012 3:56 am

ndo2db: Error: queue recv error.

Post by sarfarosh »

Hi Team,
I am getting the below error messages in my /var/log/messages and my nagios service stops abnormally due to this
Jul 21 23:59:29 localhost ndo2db: Message sent to queue.
Jul 21 23:59:29 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:30 localhost ndo2db: Message sent to queue.
Jul 21 23:59:30 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:31 localhost ndo2db: Message sent to queue.
Jul 21 23:59:31 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:32 localhost ndo2db: Message sent to queue.
Jul 21 23:59:32 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:33 localhost ndo2db: Message sent to queue.
Jul 21 23:59:33 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:34 localhost ndo2db: Message sent to queue.
Jul 21 23:59:34 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:35 localhost ndo2db: Message sent to queue.
Jul 21 23:59:35 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost rsyslogd-2177: imuxsock begins to drop messages from pid 53259 due to rate-limiting
sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 262144000
kernel.msgmax = 262144000
kernel.shmmax = 7294967295
kernel.shmall = 768435456
kernel.msgmni = 512000
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.ip_forward = 0
kernel.exec-shield = 1
kernel.randomize_va_space = 1

I have had followed the below link but no luck, please help
https://support.nagios.com/kb/article.php?id=139
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: ndo2db: Error: queue recv error.

Post by ssax »

Please run these commands:

Code: Select all

service npcd stop
service nagios stop
service ndo2db stop
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql stop; fi;
service mysqld restart
rm -rf /usr/local/nagios/var/rw/nagios.cmd
rm -rf /usr/local/nagios/var/nagios.lock
rm -rf /usr/local/nagios/var/ndo.sock
rm -rf /usr/local/nagios/var/ndo2db.lock
rm -rf /usr/local/nagiosxi/var/reconfigure_nagios.lock
rm -rf /var/lib/mrtg/mrtg_l
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill -9 -u nagios
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql start; fi;
service httpd restart
service ndo2db start
service nagios start
service npcd start
service crond restart
Then validate if it's now working.

Then attach these files:

Code: Select all

/etc/init.d/nagios
And send the full output of these commands:

Code: Select all

chage -l nagios
grep nag /etc/group
ipcs -q
tail -n50 /var/log/messages /var/log/mysqld.log /var/log/mariadb/mariadb.log
Also, please include a screenshot of Admin > Performance Settings > Databases.


Thank you
sarfarosh
Posts: 211
Joined: Fri Oct 05, 2012 3:56 am

Re: ndo2db: Error: queue recv error.

Post by sarfarosh »

Hi SSAX,
Still that error is coming. Please find the /etc/init.d/nagios file and commands output in attachment.
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: ndo2db: Error: queue recv error.

Post by ssax »

Please PM a copy of your profile, you can download it by going to Admin > System Config > System Profile and click the Download Profile button in the top right corner.

Did this just start occurring or did it come about from an upgrade or some other maintenance?
sarfarosh
Posts: 211
Joined: Fri Oct 05, 2012 3:56 am

Re: ndo2db: Error: queue recv error.

Post by sarfarosh »

Hi ssax,
I am not sure since when it started occurring, but we did upgraded to latest version of nagios xi. Please find below the attached profile.
You do not have the required permissions to view the files attached to this post.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: ndo2db: Error: queue recv error.

Post by dwhitfield »

You should increase the following:

Code: Select all

kernel.msgmnb = 262144000
kernel.msgmax = 262144000
Those are what we suggest in our kb article, but we have at least one customer that uses 10x the default (5x what you have). Please let us know if increasing them further does not resolve the issue.

Also, I note that you have about 20k Hosts + Services. This is about the place where you need to think seriously about getting a second XI box. Alternatively, you can decrease your check frequency.
sarfarosh
Posts: 211
Joined: Fri Oct 05, 2012 3:56 am

Re: ndo2db: Error: queue recv error.

Post by sarfarosh »

Hi dwhitfield,
kernel.msgmnb and kernel.msgmax are increased further but still getting same error and also CPU load is too high i.e. sometimes it goes above 200.
Also we have one Nagios server and 2 Workers.
Nagios Server:
bash-4.1# free -m
total used free shared buffers cached
Mem: 64416 36899 27517 389 59 1494
-/+ buffers/cache: 35344 29072
Swap: 61439 521 60918

Worker1
[root@SRVBAN19NGLBVM1 ~]# free -m
total used free shared buffers cached
Mem: 32101 30636 1465 0 255 4496
-/+ buffers/cache: 25884 6216
Swap: 16119 0 16119

Worker2
[root@SRVBAN19NGLBVM2 ~]# free -m
total used free shared buffers cached
Mem: 15942 13025 2917 0 14 98
-/+ buffers/cache: 12912 3030
Swap: 8039 8 8031
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: ndo2db: Error: queue recv error.

Post by dwhitfield »

If your load is getting that high, you need to take a look at https://assets.nagios.com/downloads/nag ... ios-XI.pdf

RAM is clearly not your issue.

You could try decreasing the service check timeout: service_check_timeout=60
More checks will time out, but you'll get better performance over all.

Your host templates all have a default check interval of 5. You could try increasing that, as I mentioned yesterday. I also checked the services on a couple of hosts in your profile and they were all set to 5 minutes. You really need to go through and decide what doesn't need to be checked every 5 minutes, or you need to offload some of this onto a Core box or another XI box.

It's really helpful you use code blocks for output. For example:

Code: Select all

bash-4.1# free -m
total used free shared buffers cached
Mem: 64416 36899 27517 389 59 1494
-/+ buffers/cache: 35344 29072
Swap: 61439 521 60918
Locked