ndo2db: Error: queue recv error.

sarfarosh · Post by **sarfarosh** » Fri Jul 21, 2017 12:43 pm

Hi Team,
I am getting the below error messages in my /var/log/messages and my nagios service stops abnormally due to this
Jul 21 23:59:29 localhost ndo2db: Message sent to queue.
Jul 21 23:59:29 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:30 localhost ndo2db: Message sent to queue.
Jul 21 23:59:30 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:31 localhost ndo2db: Message sent to queue.
Jul 21 23:59:31 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:32 localhost ndo2db: Message sent to queue.
Jul 21 23:59:32 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:33 localhost ndo2db: Message sent to queue.
Jul 21 23:59:33 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:34 localhost ndo2db: Message sent to queue.
Jul 21 23:59:34 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:35 localhost ndo2db: Message sent to queue.
Jul 21 23:59:35 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost rsyslogd-2177: imuxsock begins to drop messages from pid 53259 due to rate-limiting
sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 262144000
kernel.msgmax = 262144000
kernel.shmmax = 7294967295
kernel.shmall = 768435456
kernel.msgmni = 512000
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.ip_forward = 0
kernel.exec-shield = 1
kernel.randomize_va_space = 1

I have had followed the below link but no luck, please help
https://support.nagios.com/kb/article.php?id=139

ssax · Post by **ssax** » Fri Jul 21, 2017 1:47 pm

Please run these commands:

Code: Select all

service npcd stop
service nagios stop
service ndo2db stop
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql stop; fi;
service mysqld restart
rm -rf /usr/local/nagios/var/rw/nagios.cmd
rm -rf /usr/local/nagios/var/nagios.lock
rm -rf /usr/local/nagios/var/ndo.sock
rm -rf /usr/local/nagios/var/ndo2db.lock
rm -rf /usr/local/nagiosxi/var/reconfigure_nagios.lock
rm -rf /var/lib/mrtg/mrtg_l
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill -9 -u nagios
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql start; fi;
service httpd restart
service ndo2db start
service nagios start
service npcd start
service crond restart

Then validate if it's now working.

Then attach these files:

Code: Select all

/etc/init.d/nagios

And send the full output of these commands:

Code: Select all

chage -l nagios
grep nag /etc/group
ipcs -q
tail -n50 /var/log/messages /var/log/mysqld.log /var/log/mariadb/mariadb.log

Also, please include a screenshot of Admin > Performance Settings > Databases.

Thank you

sarfarosh · Post by **sarfarosh** » Mon Jul 24, 2017 8:19 am

Hi SSAX,
Still that error is coming. Please find the /etc/init.d/nagios file and commands output in attachment.

ssax · Post by **ssax** » Mon Jul 24, 2017 10:16 am

Please PM a copy of your profile, you can download it by going to Admin > System Config > System Profile and click the Download Profile button in the top right corner.

Did this just start occurring or did it come about from an upgrade or some other maintenance?

sarfarosh · Post by **sarfarosh** » Tue Jul 25, 2017 1:08 am

Hi ssax,
I am not sure since when it started occurring, but we did upgraded to latest version of nagios xi. Please find below the attached profile.

dwhitfield · Post by **dwhitfield** » Tue Jul 25, 2017 1:35 pm

You should increase the following:

Code: Select all

kernel.msgmnb = 262144000
kernel.msgmax = 262144000

Those are what we suggest in our kb article, but we have at least one customer that uses 10x the default (5x what you have). Please let us know if increasing them further does not resolve the issue.

Also, I note that you have about 20k Hosts + Services. This is about the place where you need to think seriously about getting a second XI box. Alternatively, you can decrease your check frequency.

sarfarosh · Post by **sarfarosh** » Wed Jul 26, 2017 1:49 am

Hi dwhitfield,
kernel.msgmnb and kernel.msgmax are increased further but still getting same error and also CPU load is too high i.e. sometimes it goes above 200.
Also we have one Nagios server and 2 Workers.
Nagios Server:
bash-4.1# free -m
total used free shared buffers cached
Mem: 64416 36899 27517 389 59 1494
-/+ buffers/cache: 35344 29072
Swap: 61439 521 60918

Worker1
[root@SRVBAN19NGLBVM1 ~]# free -m
total used free shared buffers cached
Mem: 32101 30636 1465 0 255 4496
-/+ buffers/cache: 25884 6216
Swap: 16119 0 16119

Worker2
[root@SRVBAN19NGLBVM2 ~]# free -m
total used free shared buffers cached
Mem: 15942 13025 2917 0 14 98
-/+ buffers/cache: 12912 3030
Swap: 8039 8 8031

dwhitfield · Post by **dwhitfield** » Wed Jul 26, 2017 9:45 am

If your load is getting that high, you need to take a look at https://assets.nagios.com/downloads/nag ... ios-XI.pdf

RAM is clearly not your issue.

You could try decreasing the service check timeout: service_check_timeout=60
More checks will time out, but you'll get better performance over all.

Your host templates all have a default check interval of 5. You could try increasing that, as I mentioned yesterday. I also checked the services on a couple of hosts in your profile and they were all set to 5 minutes. You really need to go through and decide what doesn't need to be checked every 5 minutes, or you need to offload some of this onto a Core box or another XI box.

It's really helpful you use code blocks for output. For example:

Code: Select all

bash-4.1# free -m
total used free shared buffers cached
Mem: 64416 36899 27517 389 59 1494
-/+ buffers/cache: 35344 29072
Swap: 61439 521 60918

Nagios Support Forum

ndo2db: Error: queue recv error.

ndo2db: Error: queue recv error.

Re: ndo2db: Error: queue recv error.

Re: ndo2db: Error: queue recv error.

Re: ndo2db: Error: queue recv error.

Re: ndo2db: Error: queue recv error.

Re: ndo2db: Error: queue recv error.

Re: ndo2db: Error: queue recv error.

Re: ndo2db: Error: queue recv error.